The document discusses open data and data sharing, including defining open data, the benefits of open data, overcoming barriers to opening data such as concerns about scooping and sensitive data, best practices for making data open through formats, licensing and description, and the role of research databases and data citation in promoting open data.
2. Data Sharing
• Research data may be shared in many ways.
• Getting Started looks at sharing data via access methods: Open, Shared
and Closed Data
Thing 5
4. Data Sharing
Thing 5
1. What is 'open data'?
2. Who benefits from open data?
3. Overcoming barriers to opening data
4. Making data open
5. Open data in Research Data Australia
5. Data Sharing
Thing 5
1. What is 'open data'?
2. Who benefits from open data?
3. Overcoming barriers to opening data
4. Making data open
5. Open data in Research Data Australia
6. Thing 5
What is 'open data'?
1. freely available to download in a reusable form. Large or complex data
may be accessible via a service or facility that enables access in-situ or
the compilation of sub-sets
2. licensed with minimal restrictions to reuse
3. well described with provenance and reuse information provided
4. available in convenient, modifiable and open formats
5. managed by the provider on an ongoing basis.
The Open Data Handbook provides an introduction to the legal, social and
technical aspects of open data. It discusses what open data is as well as why
and how to make data open.
8. Data Sharing
Thing 5
1. What is 'open data'?
2. Who benefits from open data?
3. Overcoming barriers to opening data
4. Making data open
5. Open data in Research Data Australia
9. Thing 5
Who benefits from open data?
Everyone! According to the Royal Society, open data supports:
• new research and new types of research
• the application of automated knowledge discovery tools online
• the verification of previous results
• a broader base set of data than any one researcher can hope to collect
• the exploration of topics not envisioned by the initial investigators
• the creation of new data sets, information and knowledge when data from
multiple sources are combined
• the transfer of factual information to promote development and capacity
building in developing countries
• interdisciplinary, inter-sectoral, inter-institutional and international research.
10. Thing 5
Who benefits
from open data?
The many ways
open data benefits
researchers,
research
organisations,
funders, policy
makers and the
broader
community:
11. Data Sharing
Thing 5
1. What is 'open data'?
2. Who benefits from open data?
3. Overcoming barriers to opening data
4. Making data open
5. Open data in Research Data Australia
13. Thing 5
Overcoming barriers to opening data
Someone might use my data to 'scoop' me
1. Timing?
You may choose to restrict access to your data until a key paper is published.
You decide the appropriate time for making your data open.
14. Thing 5
Overcoming barriers to opening data:
Someone might use my data to 'scoop' me
2. What is the real risk of ‘scooping’?
Little risk according to:
Nature, Professor Issac Kohone, Harvard Medical School: "[we] need to
convince people that the likelihood of being scooped if they put their data out
there [is] not going to be high ... we need to do away with a culture of sitting on
data until we have mined every useful scientific grain out of it".
In a similar vein, some researchers report that any possible loss of future
potential papers is well offset by the more immediate rewards of data citations
and collaborative opportunities.
15. Thing 5
Overcoming barriers to opening data:
Someone might use my data to 'scoop' me
3. What is the real risk of ‘scooping’?
In fact, many researchers find that opening up their data has greatly benefited
their research.
Report - Professor Tim Gowers, Royal Society Research Professor, University
of Cambridge
• opened up his data to crowd-source an unsolved mathematical problem.
• 27 people made 800 substantive contributions to solve the problem in a
matter of days.
• Professor Gowers commented that this approach to research was "like driving
a car whilst normal research is like pushing it".
16. Thing 5
Overcoming barriers to opening data:
My data are sensitive due to cultural, ethical, ecological or security
considerations
There are circumstances where it may not be appropriate to make data open.
e.g.
• where individuals may be identified;
• threatened species located; or
• information affecting national security revealed.
17. Thing 5
Overcoming barriers to opening data:
My data are sensitive due to cultural, ethical, ecological or security
considerations
However, there may be ways to make sensitive data at least
partially open.
This comprehensive 26 page Publishing and Sharing
Sensitive Data - ANDS Guide (PDF, 0.73 MB) outlines
best practice for the publication and sharing of sensitive
research data in the Australian context. It should be read
in conjunction with the ANDS Introduction to Sensitive
Data.
18. Thing 5
Overcoming barriers to opening data:
My data are sensitive due to cultural, ethical, ecological or security
considerations
http://www.ands.org.au/__data/assets/pdf_file/0010/385309/sensitive-
decision-tree.pdf
ANDS Publishing and Sharing Sensitive Data DECISION TREE
20. Thing 5
Overcoming barriers to opening data:
I won't get any recognition or reward for making my data open
Tools such as Thomson Reuters Data Citation Index, enable citation metrics to
be captured for data
21. Thing 5
Overcoming barriers to opening data:
There are contractual or commercial interests associated with my data
• research data may underpin a commercialisation opportunity such as a
patent.
• Or it may be that contractually, IP arising from a project is owned by a third
party.
• In others cases though, data is not shared because of the uncertainty arising
from data not being explicitly addressed in contracts and project plans.
22. Data Sharing
Thing 5
1. What is 'open data'?
2. Who benefits from open data?
3. Overcoming barriers to opening data
4. Making data open
5. Open data in Research Data Australia
23. Thing 5
Making Data Open
Open data is Which ideally means ... So preferably not ...
Freely available
to download
a) There is no cost to access the data;
b) Access is via an internet accessible
download;
c) Data is in a form that can be readily
downloaded. Large or complex data is located
close to high performance computing or
specialised services that enable access to the
data in situ or the compilation of sub-sets.
a) Costed at more than
reproduction cost;
b) Burned to a DVD and posted
via 'snail mail';
c) Only available in huge
packages that are difficult to
reuse and/or take days to
download.
Licensed An open license such as CC-BY is applied.
A restrictive license, or worse, no
license at all. If no license is
applied, no reuse is permitted.
Well described
Standards based metadata is used with details
of data elements and inclusion of data
dictionaries. Describe the purpose of the
collection, the characteristics of the sample and
the method of data collection.
Metadata descriptions that are
very brief or will not be widely
understood. Avoid jargon and
abbreviations and don't assume
prior knowledge of the data or
subject domain.
Provided in an
open format
The data is in a convenient, modifiable and
open format that can be readily retrieved,
downloaded, indexed and searched. Where
possible, formats should be machine-readable
and non-proprietary formats are preferred. For
example, prefer netCDF over .xls.
Obscure formats or formats that
require proprietary software to
open and reuse.
Well managed
The data is managed on an ongoing basis with
a point of contact designated to assist with data
use.
Data that is loaded on to a
server and forgotten.
24. Data Sharing
Thing 5
1. What is 'open data'?
2. Who benefits from open data?
3. Overcoming barriers to opening data
4. Making data open
5. Open data in Research Data Australia
25. Data Sharing
Thing 5
Open data in
Research Data
Australia
1. New Interface
highlights the
openness of
data
2. Licenses can
be applied
26. Data Sharing
Wiley Survey
Thing 5
http://www.acscinf.org/PDF/Giffi-%20Researcher%20Data%20Insights%20--%20Infographic%20FINAL%20REVISED.pdf
27. Long-lived data: curation & preservation
https://youtu.be/qEmme
FFafUs
US Library of Congress
(LoC)
Thing 6
29. Long-lived data: curation & preservation
Thing 6
What key advice would you give someone about preserving
their born digital objects e.g. the family historian, a researcher,
yourself? ….Hint: look for ideas on the Library of Congress
Digital Preservation website.
31. Long-lived data: curation & preservation
Video - http://www.clir.org/initiatives-partnerships/data-curation
Sayeed Choudhury, Associate Dean for Research Data Management at Johns
Hopkins University (long video… to summarise)
Talks about the Stack Model for Data Mgt
Thing 6
Storage
•Disk, tape, cloud etc.
Archiving
•Identifiers for sharing and
references
Preservation
•Policy, metadata, long-term
reuse
Curation
•Adding value to data for
reuse
32. Data citation for access & attribution
• Data citation continues the tradition of acknowledging other people’s work
and ideas.
• Along with books, journals and other scholarly works, it is now possible to
formally cite research datasets and even the software that was used to
create or analyse the data.
Thing 7
34. Data citation for access & attribution
https://resear
chdata.ands.
org.au/mont
hly-drought-
australia-
drought-
index/61872
Thing 7
35. Data citation for access & attribution
http://ands.org.au
/working-with-
data/citation-and-
identifiers/data-
citation
Thing 7
36. Thing 7
Data citation for access & attribution
Force11 Joint Declaration of Data Citation Principles
• a set of principles for citing data.
• based on the premise that data citation, like the citation of other evidence
and sources, is good research practice and is part of the scholarly
ecosystem supporting data reuse.
Since they were published 2 years ago, the Principles have been endorsed by
numerous individuals and more than 100 data centres, publishers and societies.
38. Thing 7
Data citation for access & attribution
Force11 is endorsed by…
https://www.force11.org/datacitation/endorsements
39. Thing 7
Data citation for access & attribution
Given such support and clear direction,
why do you think data citation has not been uniformly adopted, so far, across all
disciplines?
40. Citation Metrics for Data
Thing 8
What are Digital Object Identifiers (DOIs)
and how do they support data citation and
metrics for data and related research
objects?
41. Citation Metrics for Data
Thing 8
DOIs are:
• unique identifiers
• provide persistent access to published articles,
datasets, software versions and a range of other
research inputs and outputs.
• over 120million DOIs in use,
• last year DOIs were “resolved” (clicked on) over
5 billion times!
• typical DOI looks like this:
http://doi.org/10.4225/08/50F62E0D359D5
42. Citation Metrics for Data
Thing 8
Google “The compendium of crop
Proteins with Annotated Locations
(cropPAL) version 1 ”
43. Data citation for access & attribution
http://ands.org.au
/working-with-
data/citation-and-
identifiers/data-
citation
Remember in Thing 7…
44. Citation Metrics for Data
Here’s a controversial question to discuss:
Should DOIs be routinely applied to all research outputs?
Remember that DOIs carry an expectation of persistence
(maintenance costs etc.) but can provide be used to collect
metrics as well as link articles and data (evidence of impact.)
Thing 8
45. Thing 8
Citation Metrics for Data
• Alternative metrics or altmetrics count the number of views, number of
downloads, social media "likes" and recommendations associated with a
dataset.
• Because of their immediacy, altmetrics can be an early indicator of the
impact or reach of a dataset; long before formal citation metrics can be
assessed.
46. Thing 8
Citation Metrics for Data
http://classic.science
mag.org/articleusage?
gca=sci%3B346/6210
/763
Start by looking at the
almetrics for this
Phylogenomics
article published in
Science.
48. Thing 8
Citation Metrics for Data
Look also at the
associated data in
Dryad noting that the
data has been
assigned a DOI.
49. Thing 8
Citation Metrics for Data
By way of comparison, as of early April 2016:
• the same dataset had been cited once in Thomson Reuters Data Citation Index
• the article had been cited 143 times in Web of Science
Share your thoughts
Do you think altmetrics for data have value in academic settings? Why?
50. Contacts
Contact UWA 23 Things Coordinators:
Caroline Clark
caroline.clark@uwa.edu.au
Nola Steiner
nola.steiner@uwa.edu.au
Katina Toufexis
katina.toufexis@uwa.edu.au
Editor's Notes
It could be argued that few tangible rewards currently exist for those who make their data open. However, things are starting to change.
Tools such as Thomson Reuters Data Citation Index, enable citation metrics to be captured for data, in much the same way as they are for publications.
Thi
Records in the Data Citation Index are intended to:
provide attribution for a data object to the person(s) and institution(s) creating the data
provide a standard form of citation for each data object to encourage citation (the format of the data citation recommended by Thomson Reuters follows the DataCite guidelines)
track citations and reuse of data in the scientific literature and provide bidirectional links between research articles and the data they use or generate
provide a means to discover data associated with research publications.
Broadly, in order to be accepted into the Data Citation Index the records in a data source:
must be able to provide minimum required metadata required to validate against the DCI schema
ANDS have developed a mapping from RIF-CS to the DCI schema and a guide to optimising records for DCI compliance.
Elements needed to create a data citation must be present in the metadata.
should describe data objects held in repositories under the control of the ANDS partner or data provider
Records should not point to institutional web pages or replicate metadata descriptions for data held in other repositories, eg.PANGAEA More information(PDF, 0.5 MB).
If your data source does contain such records, they can be "tagged" for exclusion from the harvest to DCI.
should describe data collections, datasets or repositories - see RIF-CS Type
meet the Thomson Reuters repository evaluation, selection and coverage policies.
Establishing a harvest from your Data Source to the Data Citation Index
The high level workflow for including an RDA data source in the DCI harvest involves:
RDA provider contacts their Outreach Officer or services@ands.org.au to express interest in establishing a DCI harvest.
ANDS and the provider review and discuss record quality and transform as well as the proposed business processes and agree to proceed.
ANDS provides an initial harvest from the Data Source to DCI and advises Thomson Reuters of the nominated contact for the Data Source.
Thomson Reuters assess a sample of records in the DCI output against their criteria for inclusion as described above. They also check quality of content, compliance with the DCI metadata schema and the richness of the record as assessed against the content available in the source repository.
Thomson Reuters staff will liaise directly with the nominated contact for the Data Source to discuss the metadata assessment and to create a Repository Record for the Data Source in DCI. This record provides the Repository Name in each DCI record. All collection records for the Data Source will be linked to this record in DCI. The screenshot below shows an example.
Production harvest from the Data Source to DCI established.
Thomson Reuters provide a DCI admin log-in for use by the nominated Data Source contact.
Records are reharvested from RDA to DCI on a regular basis.
s provides the opportunity for data citation metrics to be included in project proposals, promotion cases and CVs.
research data may underpin a commercialisation opportunity such as a patent.
Or it may be that contractually, IP arising from a project is owned by a third party.
In others cases though, data is not shared because of the uncertainty arising from data not being explicitly addressed in contracts and project plans.
Ideally, discussions around data ownership, ongoing management and access should start at the project proposal stage.
Start from a position of "why not make the data open?" and consider how any perceived risks associated with making the data open can be addressed.
Let's look at the 5 characteristics of open data in a little more detail.
It's worth noting that even if you can't meet all the criteria for 'open data' there are benefits in making data as open as possible.
Fewer barriers = more opportunities for data to be reused and cited.
Open data in Research Data Australia
In December 2014 it became possible for collection descriptions in Research Data Australia (RDA) to include information that highlights the 'openness' of the data being described.
Collection records can be encoded as being openly accessible and openly licensed and include a link to download the data or access the data via a service. See Figure 1 below.
The new RDA interface released in April 2015 significantly raised the profile of data that has open characteristics.
The interface provides strong visual indicators for data that is publicly accessible online and offers search and browse options that enable users to easily discover and access open data. See Figure 2.
The goal is to maximise the reuse and citation of data described in RDA.
Take advantage of the opportunities these enhancements offer by ensuring your records provide the relevant RIF-CS encoding.
Also, be sure to apply an open license where possible. A CC-BY licence is an open licence but also requires the data provider gets attribution when the data is reused.
Take a look at this infographic from Wiley titled Research Data Sharing Insights [PDF, 2.08MB]It provides a succinct overview of current data sharing practice and perceptions.
Now look closely at the sections titled 'Global Data Sharing Trends' and 'Data Sharing By Discipline'.
What key advice would you give someone about preserving their born digital objects eg the family historian, a researcher, yourself….Hint: look for ideas on the Library of Congress Digital Preservation website.
Long term preservation
Succession planning
ANDS Suggests that we look at the website LoC
This site gives info on how to preserve your digital materials
What key advice would you give someone about preserving their born digital objects eg the family historian, a researcher, yourself….Hint: look for ideas on the Library of Congress Digital Preservation website.
Long term preservation
Succession planning
ANDS Suggests that we look at the website LoC
This site gives info on how to preserve your digital materials
What key advice would you give someone about preserving their born digital objects eg the family historian, a researcher, yourself….Hint: look for ideas on the Library of Congress Digital Preservation website.
Long term preservation
Succession planning
These terms are used interchangeably but create ambiguity
Storage is necessary but not sufficient for archiving.
Each layer is required for the next.
Start by looking back to the Weddell Seal dataset we explored in Thing 4.
Check out how many times it has been cited.
This citation count has been measured by Thomson Reuters Data Citation Index product.
Click on the ‘Cite’ button to see the similarities between the formats for citation of data and other scholarly publications.
At eResearch we can mint a DOI for a dataset once it has been added to RDO
Now look at the Hutchinson Drought Index data record in Research Data Australia.This research data makes cross disciplinary connections between episodes of drought and correlated increases in rural mental health issues.The beauty of this record is that it shows the entirety of the research outputs - publications, software, related datasets and more - all of which are citable.
ANDS have a good introduction to data citation on their website
At eResearch we can mint a DOI for a dataset once it has been added to RDO
Awareness and support are not the same thing.
Endorsing these principles is great, but if you don't actively do anything at your institution to promote and support data citation practices, then what is the point?
I think there is a growing awareness of data citation principles, but actual practice is lagging behind because it will take a lot of time and effort to change the traditional research culture of keeping your data hidden away so no one else can scoop your work.
Libraries are well placed to work with researchers to try to change this culture by demonstrating the benefits of preserving, sharing, and citing data.
Google
Go to b2find
Resolve DOI
Go back tp osearch results
Go to RDA
Show doi at bottom
Show DOI in “Cite This” pop-up – as it appears in a structured citation.
ANDS have a good introduction to data citation on their website
At eResearch we can mint a DOI for a dataset once it has been added to RDO
Note the number and pattern of downloads for this article since it was published in November 2014.
Now click on the “donut” or the link to ‘More Details’ to see the wealth of information available.
Look also at the associated data in Dryad noting that the data has been assigned a DOI.
Can you see how many times the data has been downloaded and the record viewed (scroll down to the bottom of the record)?