DataONE Education Module 02: Data Sharing

  • 305 views
Uploaded on

Lesson 2 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released …

Lesson 2 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.

More in: Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
305
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • This is Lesson 2 of the DataONE Data Management learning series. This lessoncovers Data Management: Data Sharing
  • The topics covered in this lesson include: the role of data sharing within the lifecycle, the value of data sharing, concerns about data sharing, and methods for making data sharable.
  • After completing this lesson, you will be able to recognize the benefits of sharing scientific data, address concerns about sharing data, outline a process for making data sharable, and identify mechanisms for sharing data.
  • The Data Life Cycle is a continuum of data development, manipulation, management, and storage stages.
  • Data sharing should be addressed throughout the Data Life Cycle1Guide to social science data preparation and archiving: Best practice throughout the data life cycle, 4th edition (ICPSR, 2009)
  • Effective data sharing requires careful thought during each stage of the data development process including: description and documentation of the data process, content, and character; deposition and storage of the data in a location from which it can be accessed or shared; preservation of the data using a format and media that enable long term reuse; and making the data discoverable by publishing information about the data in research publications, data clearinghouses and data distribution portals.
  • Why expend the extra effort to share data? Because it benefits the public, the research sponsor, the research community and, perhaps most importantly, the researcher.
  • How does the public benefit from shared research? The more informed the public the better they are able to understand and contribute toward effective public and personal decisions:environmental and economic planning federal, state and local policiessocial choices such as voting, use of tax dollars and education optionspersonal lifestyles and health choices such as exercise, smoking, and nutrition2Australian Bureau of Statistics, National Statistical Service (2009) A good practice guide to sharing your data with others, Vers. 1. http://www.nss.gov.au/nss/home.nsf/NSS/E6C05AE57C80D737CA25761D002FD676?opendocument8Niu, J. (2006). Reward and Punishment Mechanisms for Research Data Sharing. IASSIST Quarterly, Winter 2006.
  • Why do research sponsors encourage data sharing? Because sponsors have an obligation to maximize the investment of research dollars.Data sharing enhances the value of the research investment byenabling external reviewers to verify the project performance metrics and outcomes. This not only increases the credibility of the data but also spurs new research that canbuild upon the initial investment and advance the science rather than duplicate expenditures. 1Guide to social science data preparation and archiving: Best practice throughout the data life cycle, 4th edition (ICPSR, 2009)3 Piwowar, H.A. (2011). A new task for NSF reviewers:  Recognizing the value of data reuse. http://researchremix.wordpress.com/2011/05/28/dear-nsf-reviewers/
  • The scientific community as a whole also benefits from sharing among researchers. Data sharing allows researchers to build upon one another’s work and to further, rather than duplicate, the science by exploring new findings or combining findings into meta analyses that cannot be performed with individual data. In sharing data, the scientific community expands both individual perspectives and the collective comprehension.1Guide to social science data preparation and archiving: Best practice throughout the data life cycle, 4th edition (ICPSR, 2009)4Piwowar HA, Becich MJ, Bilofsky H, Crowley RS, on behalf of the caBIG Data Sharing and Intellectual Capital Workspace (2008) Towards a Data Sharing Culture: Recommendations for Leadership from Academic Health Centers. PLoS Med 5(9): e183. doi:10.1371/journal.pmed.00501835Teeters, J.L., Harris, K.D., Millman, K.J., Olshausen, B.A., Sommer, F.T. (2008). Data Sharing for Computational Neuroscience. Neuroinform, DOI 10.1007s12021-008-9009-y 6National Institute of Health (NIH) (2003). NIH Data Sharing Policy and Implementation Guidelines.9Borgman, C.L. Research Data: Who will share what, with whom, when, and why? In Proceedings of the China-North American Library Conference, Beijing , September 2010. (http://works.bepress.com/borgman/238/)
  • Access to related research enables members of the scientific community to better reproduce, compare and assess methods and results. Scientists are able to learn from one another and educate new researchers as to the most current and significant findings. 1Guide to social science data preparation and archiving: Best practice throughout the data life cycle, 4th edition (ICPSR, 2009)4Piwowar HA, Becich MJ, Bilofsky H, Crowley RS, on behalf of the caBIG Data Sharing and Intellectual Capital Workspace (2008) Towards a Data Sharing Culture: Recommendations for Leadership from Academic Health Centers. PLoS Med 5(9): e183. doi:10.1371/journal.pmed.00501835Teeters, J.L., Harris, K.D., Millman, K.J., Olshausen, B.A., Sommer, F.T. (2008). Data Sharing for Computational Neuroscience. Neuroinform, DOI 10.1007s12021-008-9009-y 6National Institute of Health (NIH) (2003). NIH Data Sharing Policy and Implementation Guidelines.9Borgman, C.L. Research Data: Who will share what, with whom, when, and why? In Proceedings of the China-North American Library Conference, Beijing , September 2010. (http://works.bepress.com/borgman/238/)
  • And finally, how does the independent researcher benefit from data sharing? When scientists share their data, they gain recognition as an authoritative source and respect as a wise investment for research dollars. When data are exposed, feedback from the broader community can be used to improve the quality and presentation of the data. Shared data also allows for greater opportunity for data exchange and networking opportunities with peers and potential collaborators.4Piwowar HA, Becich MJ, Bilofsky H, Crowley RS, on behalf of the caBIG Data Sharing and Intellectual Capital Workspace (2008) Towards a Data Sharing Culture: Recommendations for Leadership from Academic Health Centers. PLoS Med 5(9): e183. doi:10.1371/journal.pmed.00501835Teeters, J.L., Harris, K.D., Millman, K.J., Olshausen, B.A., Sommer, F.T. (2008). Data Sharing for Computational Neuroscience. Neuroinform, DOI 10.1007s12021-008-9009-y 9Borgman, C.L. Research Data: Who will share what, with whom, when, and why? In Proceedings of the China-North American Library Conference, Beijing , September 2010. (http://works.bepress.com/borgman/238/)
  • Even if the value of data sharing is recognized, concerns remain as to the impacts of increased data exposure.
  • Researchers may worry that the data will be taken out of context, misinterpreted or used inappropriately. They may also be concerned about maintaining the confidentiality and security of sensitive data. Business concerns may arise as well. Will data users give proper credit and acknowledgement to the scientist? Will the scientist lose a competitive advantage by sharing this valuable resource? 8Niu, J. (2006). Reward and Punishment Mechanisms for Research Data Sharing. IASSIST Quarterly, Winter 2006.1Guide to social science data preparation and archiving: Best practice throughout the data life cycle, 4th edition (ICPSR, 2009)9Borgman, C.L. Research Data: Who will share what, with whom, when, and why? In Proceedings of the China-North American Library Conference, Beijing , September 2010. (http://works.bepress.com/borgman/238/)
  • Each of these issues can, in great part, be addressed by providing rich data documentation known as ‘metadata’.8Niu, J. (2006). Reward and Punishment Mechanisms for Research Data Sharing. IASSIST Quarterly, Winter 2006.1Guide to social science data preparation and archiving: Best practice throughout the data life cycle, 4th edition (ICPSR, 2009)9Borgman, C.L. Research Data: Who will share what, with whom, when, and why? In Proceedings of the China-North American Library Conference, Beijing , September 2010. (http://works.bepress.com/borgman/238/)
  • By providing metadata, the research scientist establishes the purpose, methods, sources and parameters of the data. As such, data users are given the information necessary to appropriately apply, protect and cite the data. If the metadata contains information about proprietary data processing or analysis techniques, the competitive advantage can be maintained by creating a second, more generalized, metadata record for public distribution.
  • The more robust your metadata, the easier your data will be discovered and the more appropriately it will be applied. Making data sharable involves the following steps. Step 1: Create robust metadata that will be discoverable. Be specific in regards to geography and time periods. Use discipline specific themes, place names, and keywords. Describe any attributes and include links to associated data catalogues, data downloads, and project websites.
  • Step 2: Be sure to include archival and reference information with properly formatted data citations for sources and content. Include persistent data identifiers and any related metadata identifiers. References:http://en.wikipedia.org/wiki/Universally_unique_identifierhttp://mule1.dataone.org/ArchitectureDocs-current/design/PIDs.html
  • Step 3: Be sure to have data contributors review their metadata to ensure validity and organizational correctness. Are the processes correct? Is your contribution adequately represented and reflected? Is your organization properly recognized and is the funding organization properly recognized? Be sure to get management and sponsor approval on the data publication including the content, presentation, and manner in which contributors are identified.
  • Step 4: Publish your metadata in data portals and clearinghouses. Seek out relevant government portals and portals developed by specific communities of practice.
  • In addition to Data Clearinghouses and portals, consider publishing your metadata at project or program websites that describe program activities. You can place links within online lessons, outreach products, and into training related resources. For example one can publish their metadata to a WAF, or Web-Accessible folder, that enables direct search and discovery without going through a portal or clearinghouse application. Community or Public Clouds are similar to WAFs. Public and communal clouds offer generous bandwidths and a shared storage system which provides an effective method for data sharing. 4Piwowar HA, Becich MJ, Bilofsky H, Crowley RS, on behalf of the caBIG Data Sharing and Intellectual Capital Workspace (2008) Towards a Data Sharing Culture: Recommendations for Leadership from Academic Health Centers. PLoS Med 5(9): e183. doi:10.1371/journal.pmed.00501837Roxana Geambasu, Steven D. Gribble, Henry M. Levy. "CloudViews: Communal Data Sharing in Public Clouds." In Proceedings of the First USENIX Workshop on Hot Topics in Cloud Computing (HotCloud), San Diego, USA, June 2009. Paper: [PDF]; Presentation: [PPT],[PDF]; [BIBTEX].
  • Data Sharing ethics require researchers to follow some basic best practices.  First, make sure your data is available by documenting and publishing your data using standard formats and protocols. Second, promote data use via presentations and meetings and make others aware of your data and findings. Third, solicit and be open to feedback from data users and address identified issues to improve your data. Lastly, monitor publications and websites for data use and address any misapplications found.1Guide to social science data preparation and archiving: Best practice throughout the data life cycle, 4th edition (ICPSR, 2009)4Piwowar HA, Becich MJ, Bilofsky H, Crowley RS, on behalf of the caBIG Data Sharing and Intellectual Capital Workspace (2008) Towards a Data Sharing Culture: Recommendations for Leadership from Academic Health Centers. PLoS Med 5(9): e183. doi:10.1371/journal.pmed.0050183
  • In 2003, a group of scientists from the National Institutes of Health, the Food and Drug Administration, drug and medical imaging industries, universities, and nonprofit groups joined in a collaborative effort to find the biological markers that show the progression of Alzheimer’s disease in the human brain. The goal of this project was to do research on a massive scale that would involve sharing and making accessible all the data available to anyone in the world with a computer. Dr. John Trojanowski, an Alzheimer’s researcher at the University of Pennsylvania, stated “It’s not science the way most of us have practiced it in our careers. But we all realized that we would never get biomarkers unless all of us parked our egos and intellectual-property noses outside the door and agreed that all of our data would be made public immediately.” [Our “Data in Real Life” segment features a news report on this research. (?)]
  • In summary, data sharing adds value to the data. As such, it is the responsibility of the researcher to share their data. Metadata should be created for data resources to support data accountability, liability and usability. Research sponsors expect, and increasingly require, data to be shared. THIS CONCLUDES Lesson 2. To assess your learning on the content presented in this lesson, proceed to the next slide take the quiz.

Transcript

  • 1. Data Sharing Lesson 2: Data Sharing Photo by Michelle Chang. All Rights Reserved
  • 2. • Data Sharing Within the Data Lifecycle • Value of Data Sharing • Concerns About Data Sharing • Methods for Making Data Sharable CC image by Kevin Byron on FlickrData Sharing
  • 3. • After completing this lesson, the participant will be able to: o discuss data sharing considerations within the data life cycle o recognize the benefits of sharing scientific data o address concerns about sharing data o outline a process for making data sharable o identify mechanisms for sharing data CC image by PictureYouth on FlickrData Sharing
  • 4. • Data sharing should be Plan addressed throughout the data life cycle1 Analyze Collect Integrate Assure Discover Describe PreserveData Sharing
  • 5. • Several stages require critical attention to ensure effective data sharing Describe document the data content, character and process Deposit store the data in a location from which it can be accessed Preserve select storage formats and media with long term use in mind Discover publish information about the data so that others can find itData Sharing
  • 6. Data sharing requires effort, resources, and faith in others. Why do it? For the benefit of: CC image by Jessica Lucia on Flickr o the public o the research sponsor o the research community o the researcherData Sharing
  • 7. A better informed public yields better decision making with regard to: o Environmental and economic planning o Federal, state, and local policies o social choices such as use of tax dollars and education options o personal lifestyle and health such as nutrition and recreation CC image by falonyates on FlickrData Sharing
  • 8. • Organizations that sponsor research must maximize the value of research dollars • Data sharing enhances the value of research investments by enabling: o verification of performance metrics and outcomes o new research and increased return on investment o advancement of the science o reduced data duplication expendituresData Sharing
  • 9. Access to related research enables community members to: o build upon the work of others and further, rather than repeat, the science4 o perform meta analyses that cannot be performed with individual datasets or laboratories5 o share resources and perspectives so that comprehension is expanded and enhanced5 CC image by Lawrence Berkeley National Laboratory on FlickrData Sharing
  • 10. Access to related research enables community members to (cont’d): o increase transparency, reproducibility and comparability of results5 o expand methodology assessment, recommendations and improvement6 o educate new researchers as to the most current and significant findings6Data Sharing
  • 11. Scientists that share data gain the benefit of: o research sponsor recognition as an authoritative source and wise investment o improved data quality due to expanded use, field checks, and feedback o greater opportunity for data exchange o improved connections to scientific network, peers, and potential collaborators CC image by SLU Madrid Campus on FlickrData Sharing
  • 12. Even if the value of data sharing is recognized, concerns remain as to the impacts of increased data exposure.Data Sharing CC image by CyberHades on Flickr
  • 13. Concern Solution inappropriate use due to misunderstanding of research purpose or parameters security and confidentiality of sensitive data lack of acknowledgement / credit loss of advantage when competing for research dollarsData Sharing
  • 14. Concern Solution inappropriate use due to misunderstanding of research metadata purpose or parameters security and confidentiality of sensitive data metadata lack of acknowledgement / credit metadata loss of advantage when competing for research dollars metadataData Sharing
  • 15. Concern Solution inappropriate use due to provide rich Abstract, Purpose, misunderstanding of research Use Constraints and Supplemental purpose or parameters Information where needed • the metadata does NOT security and confidentiality of contain the data sensitive data • Use Constraints specify who may access the data and how specify a required data citation lack of acknowledgement / credit within the Use Constraints loss data insight and competitive create second, public version with advantage when vying for generalized Data Processing research dollars DescriptionData Sharing
  • 16. Step One: Create robust metadata that is discoverable o specify geography and time periods o use discipline specific theme, place and temporal keywords, thesauri, and ontologies o describe attributes o include links to associated data catalogues, data downloads, project websites, etc.Data Sharing
  • 17. Step Two: Include archival and reference information o properly formatted data citations for the data and all sources o Universally Unique Identifiers (UUID) that uniquely identify your data and help to link the data with the metadata See the DataONE unique identifier guidance at: http://mule1.dataone.org/ArchitectureDocs-current/design/PIDs.html Data Citation Example: Sidlauskas, B. 2007. Data from: Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny: a case study from characiform fishes. Dryad Digital Repository. doi:10.5061/dryad.20Data Sharing
  • 18. Step Three: Have data contributors review your metadata to ensure validity and organizational ‘correctness’? o are the processes described accurately? o are all contributions adequately identified? o has management reviewed the product and documentation? o is the funding organization properly recognized?Data Sharing
  • 19. Step Four: Publish your metadata via: Data Portals / Clearinghouses Federal o geodata.gov o data.gov o USGS Core Science Metadata Clearinghouse (mercury.ornl.gov/clearinghouse) Communities of Practice o Knowledge Network for Biodiversity (KNB) Data Portal o Long Term Ecological Research (LETR) Network Data PortalData Sharing
  • 20. Step Four: Publish your metadata and/or data via: (cont’d) Other Online Resources: o Project and/or Program websites o Links within online lessons and outreach products o Web-accessible folders (WAF) o Community or Public CloudData Sharing
  • 21. • Document and publish data using standards • Promote data use via presentations and meetings • Solicit feedback from data users and address identified issues • Monitor publications and websites for data use and address misapplicationsData Sharing
  • 22. • In 2003, a group of scientists from the National Institutes of Health, the Food and Drug Administration, drug and medical imaging industries, universities, and nonprofit groups joined in a collaborative effort to find the CC image by Andrew Mccluskey on Flickr biological markers that show the progression of Alzheimer’s disease in the human brain. • The goal of this project was to do research on a massive scale that would involve sharing and making accessible all the data uncovered to anyone in the world with a computer. • Dr. John Trojanowski an Alzheimer’s researcher at the University of Pennsylvania stated, “It’s not science the way most of us have practiced it in our careers. But we all realized that we would never get biomarkers unless all of us parked our egos and intellectual-property noses outside the door and agreed that all of our data would be made public immediately.” • http://www.nytimes.com/2010/08/13/health/research/13alzheimer.htmlData Sharing
  • 23. • Data sharing adds value to the data • It is the responsibility of the researcher to share their data • Metadata supports data accountability, liability, and usability • Sponsors expect, some require, data to be shared • Data sharing is essential to the advancement of scienceData Sharing
  • 24. 1. Inter-university Consortium for Political and Social Research (ICPSR), ICPSR Guide to Data Preparation and Archiving: Best Practice Throughout the Data Life Cycle (ICPSR, 2009; http://www.icpsr.umich.edu/files/ICPSR/access/dataprep.pdf). [4th Edition] 2. Australian Bureau of Statistics - National Statistical Service (ABS-NSS), A good practice guide to sharing your data with others (ABS-NSS, 2009; http://www.nss.gov.au/nss/home.nsf/NSS/E6C05AE57C80D737CA25761D002 FD676?opendocument). [Vers. 1] 3. H.A. Piwowar, A new task for NSF reviewers: Recognizing the value of data reuse. ResearchRemix vers. May 28, 2011 (http://researchremix.wordpress.com/2011/05/28/dear-nsf-reviewers/). [blog posting of draft] 4. H.A. Piwowar, M.J. Becich, H. Bilofsky, R.S. Crowley, Towards a Data Sharing Culture: Recommendations for Leadership from Academic Health Centers. PLoS Med. 5(9), e183 (2008), doi:10.1371/journal.pmed.0050183. [on behalf of the caBIG Data Sharing and Intellectual Capital Workspace]Data Sharing
  • 25. 5. J.L. Teeters, K.D. Harris, K.J. Millman, B.A. Olshausen, F.T. Sommer, Data Sharing for Computational Neuroscience. Neuroinform (2008), DOI 10.1007s12021-008-9009-y. [http://redwood.berkeley.edu/fsommer/papers/teetersetal08.pdf] 6. National Institute of Health (NIH) “NIH Data Sharing Policy and Implementation Guidelines” (NIH, Washington D.C., 2003, http://grants.nih.gov/grants/policy/data_sharing/data_sharing_guidance.htm ). 7. R. Geambasu, S.D. Gribble, H. M. Levy, "CloudViews: Communal Data Sharing in Public Clouds" In Proceedings of the First USENIX Workshop on Hot Topics in Cloud Computing (HotCloud), San Diego, USA, June 2009. [Paper: PDF; Presentation: PPT, PDF] 8. J. Niu, “Reward and Punishment Mechanisms for Research Data Sharing”. IASSIST Quarterly, Winter (2006).Data Sharing
  • 26. 9. C.L. Borgman, “Research Data: Who will share what, with whom, when, and why?” in Proceedings of the China-North American Library Conference, Beijing , September 2010 (http://works.bepress.com/borgman/238/).Data Sharing
  • 27. The full slide deck may be downloaded from: http://www.dataone.org/education-modules Suggested citation: DataONE Education Module: Data Sharing. DataONE. Retrieved Nov12, 2012. From http://www.dataone.org/sites/all/documents/L02_DataSharing. pptx Copyright license information: No rights reserved; you may enhance and reuse for your own purposes. We do ask that you provide appropriate citation and attribution to DataONE.Data Sharing