Lesson 2 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
2. Data Sharing
• Data Sharing Within the Data Lifecycle
• Value of Data Sharing
• Concerns About Data Sharing
• Methods for Making Data Sharable
CCimagebyKevin
ByrononFlickr
3. Data Sharing
• After completing this lesson, the participant will be able to:
o discuss data sharing considerations within the data life cycle
o recognize the benefits of sharing scientific data
o address concerns about sharing data
o outline a process for making data sharable
o identify mechanisms for sharing data
CCimagebyPictureYouthonFlickr
4. Data Sharing
• Data sharing should be
addressed throughout
the data life cycle1
Plan
Collect
Assure
Describe
Preserve
Discover
Integrate
Analyze
5. Data Sharing
• Several stages require critical attention to ensure effective
data sharing
document the data content, character and
process
Describe
store the data in a location from which it can be
accessed
Deposit
select storage formats and media with long term
use in mind
Preserve
publish information about the data so that
others can find it
Discover
6. Data Sharing
Data sharing requires effort, resources, and faith in others.
Why do it?
For the benefit of:
o the public
o the research sponsor
o the research community
o the researcher
CCimagebyJessicaLuciaonFlickr
7. Data Sharing
A better informed public engenders trust in science research
and can yield better decision making with regard to:
o Environmental and economic planning
o Federal, state, and local policies
o Social choices such as use of tax dollars and education options
o Personal lifestyle and health such as nutrition and recreation
CCimagebyfalonyatesonFlickr
8. Data Sharing
• Organizations that sponsor research must maximize the
value of research dollars
• Data sharing enhances the value of research investments by
enabling:
o verification of performance metrics and outcomes
o new research and increased return on investment
o advancement of the science
o reduced data duplication expenditures
o enhancing and extending the record of science
9. Data Sharing
Access to related research enables community members to:
o build upon the work of others and further, rather than repeat, the
science4
o More easily engage in interdisciplinary research
o perform meta analyses that cannot be performed with individual
datasets or laboratories5
o share resources and perspectives so that comprehension is expanded
and enhanced5
CCimagebyLawrenceBerkeleyNational
LaboratoryonFlickr
10. Data Sharing
Access to related research enables community members to
(cont’d):
o increase transparency, reproducibility and comparability of results5
o expand methodology assessment, recommendations and
improvement6
o educate new researchers as to the most current and significant
findings6
11. Data Sharing
Scientists that share data gain the benefit of:
o research sponsor recognition as an authoritative source and wise
investment
o improved data quality due to expanded use, field checks, and
feedback
o greater opportunity for data exchange
o improved connections to scientific network, peers, and potential
collaborators
CCimagebySLUMadridCampus
onFlickr
12. Data Sharing
Even if the value of data sharing is recognized, concerns
remain as to the impacts of increased data exposure.
CCimagebyCyberHadesonFlickr
15. Data Sharing
Concern Solution
inappropriate use due to
misunderstanding of research
purpose or parameters
security and confidentiality of
sensitive data
lack of acknowledgement / credit
16. Data Sharing
Concern Solution
inappropriate use due to
misunderstanding of research
purpose or parameters
security and confidentiality of
sensitive data
lack of acknowledgement / credit
loss of advantage when competing
for research dollars
17. Data Sharing
Concern Solution
inappropriate use due to
misunderstanding of research
purpose or parameters
security and confidentiality of
sensitive data
lack of acknowledgement / credit
loss of advantage when competing
for research dollars
18. Data Sharing
Concern Solution
inappropriate use due to
misunderstanding of research
purpose or parameters
security and confidentiality of
sensitive data
lack of acknowledgement / credit
loss of advantage when competing
for research dollars
metadata
metadata
metadata
metadata
19. Data Sharing
Concern Solution
inappropriate use due to
misunderstanding of research
purpose or parameters
provide rich Abstract, Purpose,
Use Constraints and Supplemental
Information where needed
security and confidentiality of
sensitive data
• the metadata does NOT
contain the data
• Use Constraints specify who
may access the data and how
lack of acknowledgement / credit
specify a required data citation
within the Use Constraints
loss data insight and competitive
advantage when vying for
research dollars
create second, public version with
generalized Data Processing
Description
20. Data Sharing
Step One:
Create robust metadata that is discoverable
o metadata should be complete, correct, document methodology for
reproducibility and provide provenance
o specify geography and time periods
o use discipline specific theme, place and temporal keywords, thesauri,
and ontologies
o describe attributes
o include links to associated data catalogues, data downloads, project
websites, etc.
21. Data Sharing
Step Two:
Include archival and reference information. For example:
o properly formatted data citations for the data and all sources
o Universally Unique Identifiers (UUID) that uniquely identify your data
and help to link the data with the metadata
See the DataONE unique identifier guidance at:
http://mule1.dataone.org/ArchitectureDocs-current/design/PIDs.html
Data Citation format:https://www.datacite.org/services/cite-your-data.html
Data Citation Example: Sidlauskas, B. 2007. Data from: Testing for unequal rates of
morphological diversification in the absence of a detailed phylogeny: a case study from
characiform fishes. Dryad Digital Repository. doi:10.5061/dryad.20
22. Data Sharing
Step Three:
Have data contributors review your metadata to ensure validity
and organizational ‘correctness’?
o are the methods / techniques described accurately?
o are all contributions adequately identified?
o has the appropriate management reviewed the product and
documentation?
o is the funding organization properly recognized?
23. Data Sharing
Step Four:
Publish your metadata and/or data via:
e.g. Federal Data Catalogs
o data.gov
e.g. Data Repositories
o Knowledge Network for Biodiversity (KNB) Data Portal
o Long Term Ecological Research (LTER) Network Data Portal
o Institutional data repositories
e.g. Other Online Resources
o Project and/or Program websites
o Web-accessible folders (WAF)
o Community or Public Cloud
Searchable directory of repositories for publishing your data
o http://service.re3data.org/search
24. Data Sharing
• Document and publish data using standards
• Promote data use via presentations and meetings
• Solicit feedback from data users and address identified
issues
• Monitor publications and websites for data use and address
misapplications
25. Data Sharing
• In 2003, a group of scientists from the National Institutes
of Health, the Food and Drug Administration, drug and
medical imaging industries, universities, and nonprofit
groups joined in a collaborative effort to find the
biological markers that show the progression of
Alzheimer’s disease in the human brain.
• The goal of this project was to do research on a massive
scale that would involve sharing and making accessible all
the data uncovered to anyone in the world with a computer.
• Dr. John Trojanowski an Alzheimer’s researcher at the
University of Pennsylvania stated, “It’s not science the way
most of us have practiced it in our careers. But we all
realized that we would never get biomarkers unless all of
us parked our egos and intellectual-property noses outside the door and
agreed that all of our data would be made public immediately.”
• http://www.nytimes.com/2010/08/13/health/research/13alzheimer.html
CCimagebyAndrewMccluskeyonFlickr
26. Data Sharing
• Data sharing adds value to the data
• It is the responsibility of the researcher to share their data
• Metadata supports data accountability, liability, and
usability
• Sponsors expect, some require, data to be shared
• Data sharing is essential to the advancement of science
27. Data Sharing
1. Inter-university Consortium for Political and Social Research (ICPSR), ICPSR
Guide to Data Preparation and Archiving: Best Practice Throughout the Data
Life Cycle (ICPSR, 2009;
http://www.icpsr.umich.edu/files/ICPSR/access/dataprep.pdf). [4th Edition]
2. Australian Bureau of Statistics - National Statistical Service (ABS-NSS), A good
practice guide to sharing your data with others (ABS-NSS, 2009;
http://www.nss.gov.au/nss/home.nsf/NSS/E6C05AE57C80D737CA25761D002
FD676?opendocument). [Vers. 1]
3. H.A. Piwowar, A new task for NSF reviewers: Recognizing the value of data
reuse. ResearchRemix vers. May 28, 2011
(http://researchremix.wordpress.com/2011/05/28/dear-nsf-reviewers/).
[blog posting of draft]
4. H.A. Piwowar, M.J. Becich, H. Bilofsky, R.S. Crowley, Towards a Data Sharing
Culture: Recommendations for Leadership from Academic Health Centers.
PLoS Med. 5(9), e183 (2008), doi:10.1371/journal.pmed.0050183. [on behalf
of the caBIG Data Sharing and Intellectual Capital Workspace]
28. Data Sharing
5. J.L. Teeters, K.D. Harris, K.J. Millman, B.A. Olshausen, F.T. Sommer, Data
Sharing for Computational Neuroscience. Neuroinform (2008), DOI
10.1007s12021-008-9009-y.
[http://redwood.berkeley.edu/fsommer/papers/teetersetal08.pdf]
6. National Institute of Health (NIH) “NIH Data Sharing Policy and
Implementation Guidelines” (NIH, Washington D.C., 2003,
http://grants.nih.gov/grants/policy/data_sharing/data_sharing_guidance.htm
).
7. R. Geambasu, S.D. Gribble, H. M. Levy, "CloudViews: Communal Data Sharing
in Public Clouds" In Proceedings of the First USENIX Workshop on Hot Topics
in Cloud Computing (HotCloud), San Diego, USA, June 2009. [Paper: PDF;
Presentation: PPT, PDF]
8. J. Niu, “Reward and Punishment Mechanisms for Research Data Sharing”.
IASSIST Quarterly, Winter (2006).
29. Data Sharing
9. C.L. Borgman, “Research Data: Who will share what, with whom, when, and
why?” in Proceedings of the China-North American Library Conference, Beijing
, September 2010 (http://works.bepress.com/borgman/238/).
30. Data Sharing
The full slide deck may be downloaded from:
http://www.dataone.org/education-modules
Suggested citation:
DataONE Education Module: Data Sharing. DataONE. Retrieved
Nov12, 2012. From
http://www.dataone.org/sites/all/documents/L02_DataSharing.
pptx
Copyright license information:
No rights reserved; you may enhance and reuse for
your own purposes. We do ask that you provide
appropriate citation and attribution to DataONE.
Editor's Notes
This is Lesson 2 of the DataONE Data Management learning series. This lesson covers Data Management: Data Sharing
The topics covered in this lesson include: the role of data sharing within the lifecycle, the value of data sharing, concerns about data sharing, and methods for making data sharable.
After completing this lesson, you will be able to recognize the benefits of sharing scientific data, address concerns about sharing data, outline a process for making data sharable, and identify mechanisms for sharing data.
The Data Life Cycle is a continuum of data development, manipulation, management, and storage stages.
Data sharing should be addressed throughout the Data Life Cycle
1Guide to social science data preparation and archiving: Best practice throughout the data life cycle, 4th edition (ICPSR, 2009)
Effective data sharing requires careful thought during each stage of the data development process including:
description and documentation of the data process, content, and character;
deposition and storage of the data in a location from which it can be accessed or shared;
preservation of the data using a format and media that enable long term reuse; and
making the data discoverable by publishing information about the data in research publications, data clearinghouses and data distribution portals.
Why expend the extra effort to share data? Because it benefits the public, the research sponsor, the research community and, perhaps most importantly, the researcher.
How does the public benefit from shared research?
The more informed the public the better they are able to understand and contribute toward effective public and personal decisions:
environmental and economic planning
federal, state and local policies
social choices such as voting, use of tax dollars and education options
personal lifestyles and health choices such as exercise, smoking, and nutrition
2Australian Bureau of Statistics, National Statistical Service (2009) A good practice guide to sharing your data with others, Vers. 1. http://www.nss.gov.au/nss/home.nsf/NSS/E6C05AE57C80D737CA25761D002FD676?opendocument
8Niu, J. (2006). Reward and Punishment Mechanisms for Research Data Sharing. IASSIST Quarterly, Winter 2006.
Why do research sponsors encourage data sharing? Because sponsors have an obligation to maximize the investment of research dollars.
Data sharing enhances the value of the research investment by enabling external reviewers to verify the project performance metrics and outcomes. This not only increases the credibility of the data but also spurs new research that can build upon the initial investment and advance the science rather than duplicate expenditures.
1Guide to social science data preparation and archiving: Best practice throughout the data life cycle, 4th edition (ICPSR, 2009)
3 Piwowar, H.A. (2011). A new task for NSF reviewers: Recognizing the value of data reuse. http://researchremix.wordpress.com/2011/05/28/dear-nsf-reviewers/
The scientific community as a whole also benefits from sharing among researchers. Data sharing allows researchers to build upon one another’s work and to further, rather than duplicate, the science by exploring new findings or combining findings into meta analyses that cannot be performed with individual data. In sharing data, the scientific community expands both individual perspectives and the collective comprehension.
1Guide to social science data preparation and archiving: Best practice throughout the data life cycle, 4th edition (ICPSR, 2009)
4Piwowar HA, Becich MJ, Bilofsky H, Crowley RS, on behalf of the caBIG Data Sharing and Intellectual Capital Workspace (2008) Towards a Data Sharing Culture: Recommendations for Leadership from Academic Health Centers. PLoS Med 5(9): e183. doi:10.1371/journal.pmed.0050183
5Teeters, J.L., Harris, K.D., Millman, K.J., Olshausen, B.A., Sommer, F.T. (2008). Data Sharing for Computational Neuroscience. Neuroinform, DOI 10.1007s12021-008-9009-y
6National Institute of Health (NIH) (2003). NIH Data Sharing Policy and Implementation Guidelines.
9Borgman, C.L. Research Data: Who will share what, with whom, when, and why? In Proceedings of the China-North American Library Conference, Beijing , September 2010. (http://works.bepress.com/borgman/238/)
Access to related research enables members of the scientific community to better reproduce, compare and assess methods and results. Scientists are able to learn from one another and educate new researchers as to the most current and significant findings.
1Guide to social science data preparation and archiving: Best practice throughout the data life cycle, 4th edition (ICPSR, 2009)
4Piwowar HA, Becich MJ, Bilofsky H, Crowley RS, on behalf of the caBIG Data Sharing and Intellectual Capital Workspace (2008) Towards a Data Sharing Culture: Recommendations for Leadership from Academic Health Centers. PLoS Med 5(9): e183. doi:10.1371/journal.pmed.0050183
5Teeters, J.L., Harris, K.D., Millman, K.J., Olshausen, B.A., Sommer, F.T. (2008). Data Sharing for Computational Neuroscience. Neuroinform, DOI 10.1007s12021-008-9009-y
6National Institute of Health (NIH) (2003). NIH Data Sharing Policy and Implementation Guidelines.
9Borgman, C.L. Research Data: Who will share what, with whom, when, and why? In Proceedings of the China-North American Library Conference, Beijing , September 2010. (http://works.bepress.com/borgman/238/)
And finally, how does the independent researcher benefit from data sharing? When scientists share their data, they gain recognition as an authoritative source and respect as a wise investment for research dollars. When data are exposed, feedback from the broader community can be used to improve the quality and presentation of the data. Shared data also allows for greater opportunity for data exchange and networking opportunities with peers and potential collaborators.
4Piwowar HA, Becich MJ, Bilofsky H, Crowley RS, on behalf of the caBIG Data Sharing and Intellectual Capital Workspace (2008) Towards a Data Sharing Culture: Recommendations for Leadership from Academic Health Centers. PLoS Med 5(9): e183. doi:10.1371/journal.pmed.0050183
5Teeters, J.L., Harris, K.D., Millman, K.J., Olshausen, B.A., Sommer, F.T. (2008). Data Sharing for Computational Neuroscience. Neuroinform, DOI 10.1007s12021-008-9009-y
9Borgman, C.L. Research Data: Who will share what, with whom, when, and why? In Proceedings of the China-North American Library Conference, Beijing , September 2010. (http://works.bepress.com/borgman/238/)
Even if the value of data sharing is recognized, concerns remain as to the impacts of increased data exposure.
Researchers may worry that the data will be taken out of context, misinterpreted or used inappropriately.
8 Niu, J. (2006). Reward and Punishment Mechanisms for Research Data Sharing. IASSIST Quarterly, Winter 2006.
1Guide to social science data preparation and archiving: Best practice throughout the data life cycle, 4th edition (ICPSR, 2009)
9Borgman, C.L. Research Data: Who will share what, with whom, when, and why? In Proceedings of the China-North American Library Conference, Beijing , September 2010. (http://works.bepress.com/borgman/238/)
They may also be concerned about maintaining the confidentiality and security of sensitive data.
8 Niu, J. (2006). Reward and Punishment Mechanisms for Research Data Sharing. IASSIST Quarterly, Winter 2006.
1Guide to social science data preparation and archiving: Best practice throughout the data life cycle, 4th edition (ICPSR, 2009)
9Borgman, C.L. Research Data: Who will share what, with whom, when, and why? In Proceedings of the China-North American Library Conference, Beijing , September 2010. (http://works.bepress.com/borgman/238/)
Business concerns may arise as well. Will data users give proper credit and acknowledgement to the scientist?
8 Niu, J. (2006). Reward and Punishment Mechanisms for Research Data Sharing. IASSIST Quarterly, Winter 2006.
1Guide to social science data preparation and archiving: Best practice throughout the data life cycle, 4th edition (ICPSR, 2009)
9Borgman, C.L. Research Data: Who will share what, with whom, when, and why? In Proceedings of the China-North American Library Conference, Beijing , September 2010. (http://works.bepress.com/borgman/238/)
Will the scientist lose a competitive advantage by sharing this valuable resource?
8 Niu, J. (2006). Reward and Punishment Mechanisms for Research Data Sharing. IASSIST Quarterly, Winter 2006.
1Guide to social science data preparation and archiving: Best practice throughout the data life cycle, 4th edition (ICPSR, 2009)
9Borgman, C.L. Research Data: Who will share what, with whom, when, and why? In Proceedings of the China-North American Library Conference, Beijing , September 2010. (http://works.bepress.com/borgman/238/)
8 Niu, J. (2006). Reward and Punishment Mechanisms for Research Data Sharing. IASSIST Quarterly, Winter 2006.
1Guide to social science data preparation and archiving: Best practice throughout the data life cycle, 4th edition (ICPSR, 2009)
9Borgman, C.L. Research Data: Who will share what, with whom, when, and why? In Proceedings of the China-North American Library Conference, Beijing , September 2010. (http://works.bepress.com/borgman/238/)
Each of these issues can, in great part, be addressed by providing rich data documentation known as ‘metadata’.
By providing metadata, the research scientist establishes the purpose, methods, sources and parameters of the data. As such, data users are given the information necessary to appropriately apply, protect and cite the data. If the metadata contains information about proprietary data processing or analysis techniques, the competitive advantage can be maintained by creating a second, more generalized, metadata record for public distribution.
The more robust your metadata, the easier your data will be discovered and the more appropriately it will be applied. Making data sharable involves the following steps.
Step 1: Create robust metadata that will be discoverable. Be specific in regards to geography and time periods. Use discipline specific themes, place names, and keywords. Describe any attributes and include links to associated data catalogues, data downloads, and project websites.
Step 2: Be sure to include archival and reference information with properly formatted data citations for sources and content. Include persistent data identifiers and any related metadata identifiers.
References:
http://en.wikipedia.org/wiki/Universally_unique_identifier
http://mule1.dataone.org/ArchitectureDocs-current/design/PIDs.html
Step 3: Be sure to have data contributors review their metadata to ensure validity and organizational correctness. Are the processes correct? Is your contribution adequately represented and reflected? Is your organization properly recognized and is the funding organization properly recognized? Be sure to get management and sponsor approval on the data publication including the content, presentation, and manner in which contributors are identified.
Step 4: Publish your metadata in data repositories and catalogs. Seek out relevant government catalogs and repositories developed by specific communities of practice. Look to your institution for data preservation services. Note, data published in disciplinary specific repositories are more like to be found by those in the same community.
Data Sharing ethics require researchers to follow some basic best practices.
First, make sure your data is available by documenting and publishing your data using standard formats and protocols.
Second, promote data use via presentations and meetings and make others aware of your data and findings. Third, solicit and be open to feedback from data users and address identified issues to improve your data. Lastly, monitor publications and websites for data use and address any misapplications found.
1Guide to social science data preparation and archiving: Best practice throughout the data life cycle, 4th edition (ICPSR, 2009)
4Piwowar HA, Becich MJ, Bilofsky H, Crowley RS, on behalf of the caBIG Data Sharing and Intellectual Capital Workspace (2008) Towards a Data Sharing Culture: Recommendations for Leadership from Academic Health Centers. PLoS Med 5(9): e183. doi:10.1371/journal.pmed.0050183
In 2003, a group of scientists from the National Institutes of Health, the Food and Drug Administration, drug and medical imaging industries, universities, and nonprofit groups joined in a collaborative effort to find the biological markers that show the progression of Alzheimer’s disease in the human brain.
The goal of this project was to do research on a massive scale that would involve sharing and making accessible all the data available to anyone in the world with a computer.
Dr. John Trojanowski, an Alzheimer’s researcher at the University of Pennsylvania, stated “It’s not science the way most of us have practiced it in our careers. But we all realized that we would never get biomarkers unless all of us parked our egos and intellectual-property noses outside the door and agreed that all of our data would be made public immediately.”
[Our “Data in Real Life” segment features a news report on this research. (?)]
In summary, data sharing adds value to the data. As such, it is the responsibility of the researcher to share their data. Metadata should be created for data resources to support data accountability, liability and usability. Research sponsors expect, and increasingly require, data to be shared.
THIS CONCLUDES Lesson 2.
To assess your learning on the content presented in this lesson, proceed to the next slide take the quiz.