DataONE Education Module 02: Data Sharing

906 views

Published on

Lesson 2 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
906
On SlideShare
0
From Embeds
0
Number of Embeds
38
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • This is Lesson 2 of the DataONE Data Management learning series. This lesson covers Data Management: Data Sharing
  • The topics covered in this lesson include: the role of data sharing within the lifecycle, the value of data sharing, concerns about data sharing, and methods for making data sharable.
  • After completing this lesson, you will be able to recognize the benefits of sharing scientific data, address concerns about sharing data, outline a process for making data sharable, and identify mechanisms for sharing data.

  • The Data Life Cycle is a continuum of data development, manipulation, management, and storage stages.
  • Data sharing should be addressed throughout the Data Life Cycle
    1Guide to social science data preparation and archiving: Best practice throughout the data life cycle, 4th edition (ICPSR, 2009)


  • Effective data sharing requires careful thought during each stage of the data development process including:
    description and documentation of the data process, content, and character;
    deposition and storage of the data in a location from which it can be accessed or shared;
    preservation of the data using a format and media that enable long term reuse; and
    making the data discoverable by publishing information about the data in research publications, data clearinghouses and data distribution portals.


  • Why expend the extra effort to share data? Because it benefits the public, the research sponsor, the research community and, perhaps most importantly, the researcher.
  • How does the public benefit from shared research?
    The more informed the public the better they are able to understand and contribute toward effective public and personal decisions:
    environmental and economic planning
    federal, state and local policies
    social choices such as voting, use of tax dollars and education options
    personal lifestyles and health choices such as exercise, smoking, and nutrition
    2Australian Bureau of Statistics, National Statistical Service (2009) A good practice guide to sharing your data with others, Vers. 1. http://www.nss.gov.au/nss/home.nsf/NSS/E6C05AE57C80D737CA25761D002FD676?opendocument
    8Niu, J. (2006). Reward and Punishment Mechanisms for Research Data Sharing. IASSIST Quarterly, Winter 2006.



  • Why do research sponsors encourage data sharing? Because sponsors have an obligation to maximize the investment of research dollars.
    Data sharing enhances the value of the research investment by enabling external reviewers to verify the project performance metrics and outcomes. This not only increases the credibility of the data but also spurs new research that can build upon the initial investment and advance the science rather than duplicate expenditures.
    1Guide to social science data preparation and archiving: Best practice throughout the data life cycle, 4th edition (ICPSR, 2009)

    3 Piwowar, H.A. (2011). A new task for NSF reviewers:  Recognizing the value of data reuse. http://researchremix.wordpress.com/2011/05/28/dear-nsf-reviewers/



  • The scientific community as a whole also benefits from sharing among researchers. Data sharing allows researchers to build upon one another’s work and to further, rather than duplicate, the science by exploring new findings or combining findings into meta analyses that cannot be performed with individual data. In sharing data, the scientific community expands both individual perspectives and the collective comprehension.


    1Guide to social science data preparation and archiving: Best practice throughout the data life cycle, 4th edition (ICPSR, 2009)
    4Piwowar HA, Becich MJ, Bilofsky H, Crowley RS, on behalf of the caBIG Data Sharing and Intellectual Capital Workspace (2008) Towards a Data Sharing Culture: Recommendations for Leadership from Academic Health Centers. PLoS Med 5(9): e183. doi:10.1371/journal.pmed.0050183
    5Teeters, J.L., Harris, K.D., Millman, K.J., Olshausen, B.A., Sommer, F.T. (2008). Data Sharing for Computational Neuroscience. Neuroinform, DOI 10.1007s12021-008-9009-y
    6National Institute of Health (NIH) (2003). NIH Data Sharing Policy and Implementation Guidelines.
    9Borgman, C.L. Research Data: Who will share what, with whom, when, and why? In Proceedings of the China-North American Library Conference, Beijing , September 2010. (http://works.bepress.com/borgman/238/)

  • Access to related research enables members of the scientific community to better reproduce, compare and assess methods and results. Scientists are able to learn from one another and educate new researchers as to the most current and significant findings.


    1Guide to social science data preparation and archiving: Best practice throughout the data life cycle, 4th edition (ICPSR, 2009)
    4Piwowar HA, Becich MJ, Bilofsky H, Crowley RS, on behalf of the caBIG Data Sharing and Intellectual Capital Workspace (2008) Towards a Data Sharing Culture: Recommendations for Leadership from Academic Health Centers. PLoS Med 5(9): e183. doi:10.1371/journal.pmed.0050183
    5Teeters, J.L., Harris, K.D., Millman, K.J., Olshausen, B.A., Sommer, F.T. (2008). Data Sharing for Computational Neuroscience. Neuroinform, DOI 10.1007s12021-008-9009-y
    6National Institute of Health (NIH) (2003). NIH Data Sharing Policy and Implementation Guidelines.
    9Borgman, C.L. Research Data: Who will share what, with whom, when, and why? In Proceedings of the China-North American Library Conference, Beijing , September 2010. (http://works.bepress.com/borgman/238/)
  • And finally, how does the independent researcher benefit from data sharing? When scientists share their data, they gain recognition as an authoritative source and respect as a wise investment for research dollars. When data are exposed, feedback from the broader community can be used to improve the quality and presentation of the data. Shared data also allows for greater opportunity for data exchange and networking opportunities with peers and potential collaborators.

    4Piwowar HA, Becich MJ, Bilofsky H, Crowley RS, on behalf of the caBIG Data Sharing and Intellectual Capital Workspace (2008) Towards a Data Sharing Culture: Recommendations for Leadership from Academic Health Centers. PLoS Med 5(9): e183. doi:10.1371/journal.pmed.0050183
    5Teeters, J.L., Harris, K.D., Millman, K.J., Olshausen, B.A., Sommer, F.T. (2008). Data Sharing for Computational Neuroscience. Neuroinform, DOI 10.1007s12021-008-9009-y
    9Borgman, C.L. Research Data: Who will share what, with whom, when, and why? In Proceedings of the China-North American Library Conference, Beijing , September 2010. (http://works.bepress.com/borgman/238/)






  • Even if the value of data sharing is recognized, concerns remain as to the impacts of increased data exposure.
  • Researchers may worry that the data will be taken out of context, misinterpreted or used inappropriately.

    8 Niu, J. (2006). Reward and Punishment Mechanisms for Research Data Sharing. IASSIST Quarterly, Winter 2006.
    1Guide to social science data preparation and archiving: Best practice throughout the data life cycle, 4th edition (ICPSR, 2009)
    9Borgman, C.L. Research Data: Who will share what, with whom, when, and why? In Proceedings of the China-North American Library Conference, Beijing , September 2010. (http://works.bepress.com/borgman/238/)



  • They may also be concerned about maintaining the confidentiality and security of sensitive data.

    8 Niu, J. (2006). Reward and Punishment Mechanisms for Research Data Sharing. IASSIST Quarterly, Winter 2006.
    1Guide to social science data preparation and archiving: Best practice throughout the data life cycle, 4th edition (ICPSR, 2009)
    9Borgman, C.L. Research Data: Who will share what, with whom, when, and why? In Proceedings of the China-North American Library Conference, Beijing , September 2010. (http://works.bepress.com/borgman/238/)



  • Business concerns may arise as well. Will data users give proper credit and acknowledgement to the scientist?

    8 Niu, J. (2006). Reward and Punishment Mechanisms for Research Data Sharing. IASSIST Quarterly, Winter 2006.
    1Guide to social science data preparation and archiving: Best practice throughout the data life cycle, 4th edition (ICPSR, 2009)
    9Borgman, C.L. Research Data: Who will share what, with whom, when, and why? In Proceedings of the China-North American Library Conference, Beijing , September 2010. (http://works.bepress.com/borgman/238/)



  • Will the scientist lose a competitive advantage by sharing this valuable resource?

    8 Niu, J. (2006). Reward and Punishment Mechanisms for Research Data Sharing. IASSIST Quarterly, Winter 2006.
    1Guide to social science data preparation and archiving: Best practice throughout the data life cycle, 4th edition (ICPSR, 2009)
    9Borgman, C.L. Research Data: Who will share what, with whom, when, and why? In Proceedings of the China-North American Library Conference, Beijing , September 2010. (http://works.bepress.com/borgman/238/)



  • 8 Niu, J. (2006). Reward and Punishment Mechanisms for Research Data Sharing. IASSIST Quarterly, Winter 2006.
    1Guide to social science data preparation and archiving: Best practice throughout the data life cycle, 4th edition (ICPSR, 2009)
    9Borgman, C.L. Research Data: Who will share what, with whom, when, and why? In Proceedings of the China-North American Library Conference, Beijing , September 2010. (http://works.bepress.com/borgman/238/)




  • Each of these issues can, in great part, be addressed by providing rich data documentation known as ‘metadata’.
  • By providing metadata, the research scientist establishes the purpose, methods, sources and parameters of the data. As such, data users are given the information necessary to appropriately apply, protect and cite the data. If the metadata contains information about proprietary data processing or analysis techniques, the competitive advantage can be maintained by creating a second, more generalized, metadata record for public distribution.

  • The more robust your metadata, the easier your data will be discovered and the more appropriately it will be applied. Making data sharable involves the following steps.
    Step 1: Create robust metadata that will be discoverable. Be specific in regards to geography and time periods. Use discipline specific themes, place names, and keywords. Describe any attributes and include links to associated data catalogues, data downloads, and project websites.
  • Step 2: Be sure to include archival and reference information with properly formatted data citations for sources and content. Include persistent data identifiers and any related metadata identifiers.

    References:
    http://en.wikipedia.org/wiki/Universally_unique_identifier
    http://mule1.dataone.org/ArchitectureDocs-current/design/PIDs.html

  • Step 3: Be sure to have data contributors review their metadata to ensure validity and organizational correctness. Are the processes correct? Is your contribution adequately represented and reflected? Is your organization properly recognized and is the funding organization properly recognized? Be sure to get management and sponsor approval on the data publication including the content, presentation, and manner in which contributors are identified.
  • Step 4: Publish your metadata in data repositories and catalogs. Seek out relevant government catalogs and repositories developed by specific communities of practice. Look to your institution for data preservation services. Note, data published in disciplinary specific repositories are more like to be found by those in the same community.

  • Data Sharing ethics require researchers to follow some basic best practices.
     
    First, make sure your data is available by documenting and publishing your data using standard formats and protocols.
    Second, promote data use via presentations and meetings and make others aware of your data and findings. Third, solicit and be open to feedback from data users and address identified issues to improve your data. Lastly, monitor publications and websites for data use and address any misapplications found.

    1Guide to social science data preparation and archiving: Best practice throughout the data life cycle, 4th edition (ICPSR, 2009)
    4Piwowar HA, Becich MJ, Bilofsky H, Crowley RS, on behalf of the caBIG Data Sharing and Intellectual Capital Workspace (2008) Towards a Data Sharing Culture: Recommendations for Leadership from Academic Health Centers. PLoS Med 5(9): e183. doi:10.1371/journal.pmed.0050183
  • In 2003, a group of scientists from the National Institutes of Health, the Food and Drug Administration, drug and medical imaging industries, universities, and nonprofit groups joined in a collaborative effort to find the biological markers that show the progression of Alzheimer’s disease in the human brain.
    The goal of this project was to do research on a massive scale that would involve sharing and making accessible all the data available to anyone in the world with a computer.
    Dr. John Trojanowski, an Alzheimer’s researcher at the University of Pennsylvania, stated “It’s not science the way most of us have practiced it in our careers. But we all realized that we would never get biomarkers unless all of us parked our egos and intellectual-property noses outside the door and agreed that all of our data would be made public immediately.”
    [Our “Data in Real Life” segment features a news report on this research. (?)]
  • In summary, data sharing adds value to the data. As such, it is the responsibility of the researcher to share their data. Metadata should be created for data resources to support data accountability, liability and usability. Research sponsors expect, and increasingly require, data to be shared.
     
    THIS CONCLUDES Lesson 2.
    To assess your learning on the content presented in this lesson, proceed to the next slide take the quiz.
  • DataONE Education Module 02: Data Sharing

    1. 1. Data Sharing Lesson 2: Data Sharing PhotobyMichelleChang.AllRightsReserved
    2. 2. Data Sharing • Data Sharing Within the Data Lifecycle • Value of Data Sharing • Concerns About Data Sharing • Methods for Making Data Sharable CCimagebyKevin ByrononFlickr
    3. 3. Data Sharing • After completing this lesson, the participant will be able to: o discuss data sharing considerations within the data life cycle o recognize the benefits of sharing scientific data o address concerns about sharing data o outline a process for making data sharable o identify mechanisms for sharing data CCimagebyPictureYouthonFlickr
    4. 4. Data Sharing • Data sharing should be addressed throughout the data life cycle1 Plan Collect Assure Describe Preserve Discover Integrate Analyze
    5. 5. Data Sharing • Several stages require critical attention to ensure effective data sharing document the data content, character and process Describe store the data in a location from which it can be accessed Deposit select storage formats and media with long term use in mind Preserve publish information about the data so that others can find it Discover
    6. 6. Data Sharing Data sharing requires effort, resources, and faith in others. Why do it? For the benefit of: o the public o the research sponsor o the research community o the researcher CCimagebyJessicaLuciaonFlickr
    7. 7. Data Sharing A better informed public engenders trust in science research and can yield better decision making with regard to: o Environmental and economic planning o Federal, state, and local policies o Social choices such as use of tax dollars and education options o Personal lifestyle and health such as nutrition and recreation CCimagebyfalonyatesonFlickr
    8. 8. Data Sharing • Organizations that sponsor research must maximize the value of research dollars • Data sharing enhances the value of research investments by enabling: o verification of performance metrics and outcomes o new research and increased return on investment o advancement of the science o reduced data duplication expenditures o enhancing and extending the record of science
    9. 9. Data Sharing Access to related research enables community members to: o build upon the work of others and further, rather than repeat, the science4 o More easily engage in interdisciplinary research o perform meta analyses that cannot be performed with individual datasets or laboratories5 o share resources and perspectives so that comprehension is expanded and enhanced5 CCimagebyLawrenceBerkeleyNational LaboratoryonFlickr
    10. 10. Data Sharing Access to related research enables community members to (cont’d): o increase transparency, reproducibility and comparability of results5 o expand methodology assessment, recommendations and improvement6 o educate new researchers as to the most current and significant findings6
    11. 11. Data Sharing Scientists that share data gain the benefit of: o research sponsor recognition as an authoritative source and wise investment o improved data quality due to expanded use, field checks, and feedback o greater opportunity for data exchange o improved connections to scientific network, peers, and potential collaborators CCimagebySLUMadridCampus onFlickr
    12. 12. Data Sharing Even if the value of data sharing is recognized, concerns remain as to the impacts of increased data exposure. CCimagebyCyberHadesonFlickr
    13. 13. Data Sharing Concern Solution inappropriate use due to misunderstanding of research purpose or parameters
    14. 14. Data Sharing Concern Solution inappropriate use due to misunderstanding of research purpose or parameters security and confidentiality of sensitive data
    15. 15. Data Sharing Concern Solution inappropriate use due to misunderstanding of research purpose or parameters security and confidentiality of sensitive data lack of acknowledgement / credit
    16. 16. Data Sharing Concern Solution inappropriate use due to misunderstanding of research purpose or parameters security and confidentiality of sensitive data lack of acknowledgement / credit loss of advantage when competing for research dollars
    17. 17. Data Sharing Concern Solution inappropriate use due to misunderstanding of research purpose or parameters security and confidentiality of sensitive data lack of acknowledgement / credit loss of advantage when competing for research dollars
    18. 18. Data Sharing Concern Solution inappropriate use due to misunderstanding of research purpose or parameters security and confidentiality of sensitive data lack of acknowledgement / credit loss of advantage when competing for research dollars metadata metadata metadata metadata
    19. 19. Data Sharing Concern Solution inappropriate use due to misunderstanding of research purpose or parameters provide rich Abstract, Purpose, Use Constraints and Supplemental Information where needed security and confidentiality of sensitive data • the metadata does NOT contain the data • Use Constraints specify who may access the data and how lack of acknowledgement / credit specify a required data citation within the Use Constraints loss data insight and competitive advantage when vying for research dollars create second, public version with generalized Data Processing Description
    20. 20. Data Sharing Step One: Create robust metadata that is discoverable o metadata should be complete, correct, document methodology for reproducibility and provide provenance o specify geography and time periods o use discipline specific theme, place and temporal keywords, thesauri, and ontologies o describe attributes o include links to associated data catalogues, data downloads, project websites, etc.
    21. 21. Data Sharing Step Two: Include archival and reference information. For example: o properly formatted data citations for the data and all sources o Universally Unique Identifiers (UUID) that uniquely identify your data and help to link the data with the metadata See the DataONE unique identifier guidance at: http://mule1.dataone.org/ArchitectureDocs-current/design/PIDs.html Data Citation format:https://www.datacite.org/services/cite-your-data.html Data Citation Example: Sidlauskas, B. 2007. Data from: Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny: a case study from characiform fishes. Dryad Digital Repository. doi:10.5061/dryad.20
    22. 22. Data Sharing Step Three: Have data contributors review your metadata to ensure validity and organizational ‘correctness’? o are the methods / techniques described accurately? o are all contributions adequately identified? o has the appropriate management reviewed the product and documentation? o is the funding organization properly recognized?
    23. 23. Data Sharing Step Four: Publish your metadata and/or data via: e.g. Federal Data Catalogs o data.gov e.g. Data Repositories o Knowledge Network for Biodiversity (KNB) Data Portal o Long Term Ecological Research (LTER) Network Data Portal o Institutional data repositories e.g. Other Online Resources o Project and/or Program websites o Web-accessible folders (WAF) o Community or Public Cloud Searchable directory of repositories for publishing your data o http://service.re3data.org/search
    24. 24. Data Sharing • Document and publish data using standards • Promote data use via presentations and meetings • Solicit feedback from data users and address identified issues • Monitor publications and websites for data use and address misapplications
    25. 25. Data Sharing • In 2003, a group of scientists from the National Institutes of Health, the Food and Drug Administration, drug and medical imaging industries, universities, and nonprofit groups joined in a collaborative effort to find the biological markers that show the progression of Alzheimer’s disease in the human brain. • The goal of this project was to do research on a massive scale that would involve sharing and making accessible all the data uncovered to anyone in the world with a computer. • Dr. John Trojanowski an Alzheimer’s researcher at the University of Pennsylvania stated, “It’s not science the way most of us have practiced it in our careers. But we all realized that we would never get biomarkers unless all of us parked our egos and intellectual-property noses outside the door and agreed that all of our data would be made public immediately.” • http://www.nytimes.com/2010/08/13/health/research/13alzheimer.html CCimagebyAndrewMccluskeyonFlickr
    26. 26. Data Sharing • Data sharing adds value to the data • It is the responsibility of the researcher to share their data • Metadata supports data accountability, liability, and usability • Sponsors expect, some require, data to be shared • Data sharing is essential to the advancement of science
    27. 27. Data Sharing 1. Inter-university Consortium for Political and Social Research (ICPSR), ICPSR Guide to Data Preparation and Archiving: Best Practice Throughout the Data Life Cycle (ICPSR, 2009; http://www.icpsr.umich.edu/files/ICPSR/access/dataprep.pdf). [4th Edition] 2. Australian Bureau of Statistics - National Statistical Service (ABS-NSS), A good practice guide to sharing your data with others (ABS-NSS, 2009; http://www.nss.gov.au/nss/home.nsf/NSS/E6C05AE57C80D737CA25761D002 FD676?opendocument). [Vers. 1] 3. H.A. Piwowar, A new task for NSF reviewers: Recognizing the value of data reuse. ResearchRemix vers. May 28, 2011 (http://researchremix.wordpress.com/2011/05/28/dear-nsf-reviewers/). [blog posting of draft] 4. H.A. Piwowar, M.J. Becich, H. Bilofsky, R.S. Crowley, Towards a Data Sharing Culture: Recommendations for Leadership from Academic Health Centers. PLoS Med. 5(9), e183 (2008), doi:10.1371/journal.pmed.0050183. [on behalf of the caBIG Data Sharing and Intellectual Capital Workspace]
    28. 28. Data Sharing 5. J.L. Teeters, K.D. Harris, K.J. Millman, B.A. Olshausen, F.T. Sommer, Data Sharing for Computational Neuroscience. Neuroinform (2008), DOI 10.1007s12021-008-9009-y. [http://redwood.berkeley.edu/fsommer/papers/teetersetal08.pdf] 6. National Institute of Health (NIH) “NIH Data Sharing Policy and Implementation Guidelines” (NIH, Washington D.C., 2003, http://grants.nih.gov/grants/policy/data_sharing/data_sharing_guidance.htm ). 7. R. Geambasu, S.D. Gribble, H. M. Levy, "CloudViews: Communal Data Sharing in Public Clouds" In Proceedings of the First USENIX Workshop on Hot Topics in Cloud Computing (HotCloud), San Diego, USA, June 2009. [Paper: PDF; Presentation: PPT, PDF] 8. J. Niu, “Reward and Punishment Mechanisms for Research Data Sharing”. IASSIST Quarterly, Winter (2006).
    29. 29. Data Sharing 9. C.L. Borgman, “Research Data: Who will share what, with whom, when, and why?” in Proceedings of the China-North American Library Conference, Beijing , September 2010 (http://works.bepress.com/borgman/238/).
    30. 30. Data Sharing The full slide deck may be downloaded from: http://www.dataone.org/education-modules Suggested citation: DataONE Education Module: Data Sharing. DataONE. Retrieved Nov12, 2012. From http://www.dataone.org/sites/all/documents/L02_DataSharing. pptx Copyright license information: No rights reserved; you may enhance and reuse for your own purposes. We do ask that you provide appropriate citation and attribution to DataONE.

    ×