• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Open science, open-source, and open data: Collaboration as an emergent property?
 

Open science, open-source, and open data: Collaboration as an emergent property?

on

  • 1,874 views

Talk I gave as part of the panel "How will cyberinfrastructure capabilities shape the future of scientific collaboration?" at the Cyberinfrastructure for Collaborative Science workshop, held at the ...

Talk I gave as part of the panel "How will cyberinfrastructure capabilities shape the future of scientific collaboration?" at the Cyberinfrastructure for Collaborative Science workshop, held at the National Evolutionary Synthesis Center (NESCent), May 18-20, 2011.

More information about the workshop at
https://www.nescent.org/wg_collabsci/2011_Workshop

Statistics

Views

Total Views
1,874
Views on SlideShare
1,874
Embed Views
0

Actions

Likes
0
Downloads
8
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • \n
  • \n
  • \n
  • \n
  • Describing how the work of the summer interns got picked up by others while it was happening, based on the web presence of their evolving outputs. Examples, multiple outside groups stepped in to flesh out and maintain the database of funder policies on data archiving that Nic launched. Valerie’s resource pages on data citation led to students being invited by DataCite to contribute to the metadata model behind data DOIs. The interns also posted regularly to a project blog, shared their bibliographies on Mendeley, and their end-of-summer poster was likely seen by a much larger number of people online than at the conference, based on the Twitter buzz we observed.\n\nDownsides: There was fear (founded or not) of the consequences for organizational reputation to have things out there representing DataONE that might be seen as shoddy, or embarrasing to publishers, etc., and it made us add more disclaimers/qualifiers to the web documentation.  We discovered that there were some weaknesses with the platforms we used (e.g. OpenWetWare).  This particular batch of interns didn't seem to be intimidated.  And the open communication led to some efficiencies with the larger mentor group -- I'm not aware of inefficiencies that were introduced.  The blog may not have always been the best investment of time from a research-productivity standpoint, but was a useful learning experience for the students nonetheless. \n
  • “Researchers have found many reasons for making their notebook open: better collaborations, greater visibility and broader impact for funding requirements, sharing detailed methods, solving the dilemma of  null results publishing, helping other scientists advance, demonstrating the viability of the idea or participation in a community of open science.”\n
  • \n
  • Dryad provides a way to make data to be available not just for validation, but also for new method development, meta-analysis, and other types of synthetic science; it also give researchers credit for their data as a first-class scholarly product.\n
  • Dryad provides a way to make data to be available not just for validation, but also for new method development, meta-analysis, and other types of synthetic science; it also give researchers credit for their data as a first-class scholarly product.\n
  • An example of how valuable such archived data can be. NESCent postdoc Amy Zanne and colleagues led a group that in 2009 deposited to Dryad a dataset compiling wood anatomy data from 8412 plant species. This dataset has already been downloaded over 600 times! While some of these downloads may lead to citations, there is probably a good deal of data reuse for educational purposes, and likely also exploration of analytical methods on this unique dataset.\n\nThe inset from the corresponding Ecology Letters article shows the geographical distribution of wood density in North and South America. Each data point is the mean wood density value of all unique species occurrences in that cell. Wood density clearly varies in a very predictable way with temperature, precipitation, and seasonality. Dryad contains the data underlying this figure, but without Dryad, researchers would be unable to reconstruct the original data from this image for testing new hypotheses.\n
  • Contrast this with the newly announced TRY database. A similar community compilation of trait data. But access is guarded by a formidable two-tiered process.\n
  • And contrast that with another model for protecting the professional needs of researchers in their published data. Many of the Dryad partner journals allow authors to embargo data for a limited period, but they must archive it publicly upon publication, and reuse cannot be controlled after the embargo has been lifted. This is consistent with the idea that the original authors have a conflict of interest in the reuse of that data after publication, and should not be empowered to suppress independent reanalysis any more than they should for the contents of the article itself. Dryad supports 1 yr no-questions asked embargoes. In this case, a published dataset from some of the TRY contributors has been embargoed for 3 yrs at the discretion of the journal editor.\n
  • Mailing list traffic has gone, in the case of Gbrowse, from less then 50/month in 2007 to sometimes over 400/month in 2010\nWe know that there have been over 200 new installations of Gbrowse per month based on registrations\n
  • \n
  • \n
  • \n
  • \n
  • 8 participating R packages\nAde4, Ape, apTreeshape, Geiger, Laser, OUCH, PaleoTS, Picante\nBefore the hackathon:1/8 were using version control (but none public), 1/8 had a mailing list\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

Open science, open-source, and open data: Collaboration as an emergent property? Open science, open-source, and open data: Collaboration as an emergent property? Presentation Transcript

  • Open science, open-source, and open data: Collaboration as an emergent property? Cyberinfrastructure for Collaborative Science A workshop at NESCent, Durham, NC, May 18-20Hilmar Lapp and Todd Vision, NESCent
  • “Openness is the new flower power” “We create the scholarship. Wecreate the meta-data. We create the tools. We can reclaim andreinvent the way thatscientific scholarship is created and disseminated.” Peter Murray-Rust http://blogs.ch.cam.ac.uk/pmr/2010/08/02/flowerpoint-step-by-step/
  • E-access and interoperability are enabling factors Sidlauskas et al (2010)
  • What does “open” mean? “A piece of content or data is open ifanyone is free to use, reuse, and redistribute it —subject only, at most, to the requirement to attribute and share-alike.”
  • What does “open” mean? “A piece of content or data is open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement toattribute and share- alike.”
  • Open Notebook ScienceAn experiment within the DataONE Summer Internship Program
  • Open e-collaboration sites
  • Dryad: archiving, finding, and sharing open data• Data archived at publication • Data given persistent GUIDs• Option for limited-term embargo • Metadata searchable• Easy submission process • API for data exchange• Data may be peer reviewed • Capacity for versioning/updates• Persistent link from paper to data • Long-term preservation• Data in public domain (CCZero) • Community governance
  • Dryad: archiving, finding, and sharing open data• Data archived at publication • Data given persistent GUIDs• Option for limited-term embargo • Metadata searchable• Easy submission process • API for data exchange• Data may be peer reviewed • Capacity for versioning/updates• Persistent link from paper to data • Long-term preservation• Data in public domain (CCZero) • Community governance
  • 612 downloadsAmy Zanne
  • • Two-layered process for gaining access to data • “Individuals or groups of individuals that would like to use the TRY database for a scientific project […] should submit a proposal to TRY” • “TRY will give the proponent the list of individuals […] that will necessarily have to be […] consulted […]”
  • Another model forcontrolled sharing: Text Text• Data description published• Data embargoed 1 (or 3) yrs post publication• Afterwards it is released under CCZero
  • Open Source: The value of community Web traffic 500 Mailing list membership 2008 400 2010 300 200 100 0 ajax anno arch cma deve gbro s unce itect p l wse chema ureExample: Generic Model Organism Database
  • Community participation in Phenoscape• Anatomy Ontology Contributors: 25• Taxonomy Ontology Contributors: 11• Community data annotators: 10• Workshop Participants: ~75• Software testing volunteers: 33• Internship students: 13• Phenotype Ontology RCN: http://phenotypercn.org
  • Sustaining informatics resources over the long term• 875 modules in core, >422,000 lines of code• Most widely used Perl toolkit in the life sciences• Active for 16 years• No direct grant funding• Continues to recruit new contributors• Leadership baton passed 5 times, dozens of committers• Stajich et al (2002) Genome Biology: cited 509x
  • Open development: Social coding
  • Adopting open development influenced by incentivesComparative Methods in R Hackathon, Dec 2007
  • Steve Kembel Matt Helmus
  • Summary• The principles in common are freedom to • Reuse • Modify and recombine • Redistribute• These are also fundamental to enabling synthetic collaborative science.
  • Will openness make usbetter collaborators?
  • Will openness make usbetter collaborators? Hum Nat (2007) 18:88–108
  • Will openness make usbetter collaborators? Hum Nat (2007) 18:88–108
  • Will openness make usbetter collaborators? Hum Nat (2007) 18:88–108
  • Will openness make usbetter collaborators?
  • Whenever you do athing, act as if all theworld were watching. Thomas Jefferson
  • Acknowledgments• NESCent Directors • EvoInfo Working Group and resident scientists participants• NESCent Informatics team: • Participants and co- R. Scherle, J. Balhoff, C. organizers of 5 hackathons Kothari, D. Clements, X. Liu, V. Gapeyev, J. Auman • Google OSPO, Google Summer of Code students &• Dryad team, UNC/MRC mentors collaborators, including Heather Piwowar • The O|B|F community and fellow Bio* developers• Phenoscape team (PIs P. Mabee, M. Westerfield, T. • Comp. Phyloinformatics Summer Course students & Vision), curators, and other instructors collaborators