2. Overview
• What is data?
• Challenges in working with
data
• Advantages of good data
management
• Data management plans
• Practicalities
– Back up and storage
– Ethics
– Sharing data
– Licencing
– Resources
3. Learning outcomes
• Identify the benefits and drivers for good data management
• Appreciate the common elements of an effective data
management plan and why it is desirable to complete one
• Understand the benefits and challenges of sharing data
• Know how to describe your data
• Reflect on best practice for managing digital data effectively
• Understand what further help is available in managing data
4. • What kind of data do you
collect?
• What challenges do you
face in collecting data?
• Discuss in groups for 3
minutes
What is data?
5.
6. Advantages of RDM
Compliance with funders’& institutional policies
Reduces the risk of data loss
Facilitates sharing and reuse of data
Enhances the visibility of your research
Provides opportunities for collaborations
7. Funder requirements
Include the following matters in the final report to the Society required under
clause 4.2(c):
(i) Which data and sample repositories will be used to store the metadata,
data and samples collected as part of the Programme and
(ii) Where the metadata will be stored if no data or sample repositories are
available
8. A view from RCUK
1. Make data openly available where possible
2. Have policies and plans for research data and preserve data with long-
term value
3. Provide sufficient metadata for discovery and provide information on
access to data in publications
4. Consider legal, ethical and commercial constraints on release of research
data
5. Protect the efforts of research data creators with appropriate embargoes
6. Acknowledge the source of research datasets and abide by the terms
and conditions of use
7. Ensure cost-effective use of public funds for RDM
Credit: Loughborough University
11. What is a data management plan?
• DCC Checklist
“A Data Management Plan is a project document
which describes the data (or similar evidence) that
a project will collect, how it will be stored during
the project, how it will be archived at the end of
the project and how access will be granted to it
where appropriate.”
14. File formats for long-term access
• Non-proprietary
• Open, documented standard
• Common usage by research community
• Standard representation (ASCII, Unicode)
• Unencrypted
• Uncompressed
15. Make it so
one thing
can’t ruin
everything
Pen drives
fail Hard disk
stolen with
laptop
Hacked
email
account
Viruses and
Malware
Cloud
service
issues
Fire
Sunspots
Cosmic
rays
Alien attack
The
Apocalypse
16. When Toy Story 2 almost
vanished
<iframe width="560" height="315"
src="https://www.youtube.com/embed/yIz9
eqwLt9U" frameborder="0"
allowfullscreen></iframe>
17. Rule of three
Removable
Storage
• USB Key
• Hard Drive
Laptop or
Desktop
• Backed up
corporate
folder?
Cloud
Storage
• One/Google
drive
• Email
18. Ethics
Anonymity and confidentiality
• What personal information have you collected?
• What commitments have you made to protect
personal data
• The Privacy Act
• What have you said in your ethics application?
• Whose data is it?
21. Meta data
• Data about data
• What elements might
you use to describe
data?
22. Data citation
• Academic impact is measured by
citation counts
• Your data should be cited by you and
others
23. Data set citation
• Cool, H. E. M., & Bell, M. (2011). Excavations at St
Peter’s Church, Barton-upon-Humber [Data set].
doi:10.5284/1000389
• DOIs are available from repositories e.g. UC
Research Repository, Figshare
24. Publishing data
• PLOS
• Data journals e.g.-
– Scientific Data
– Geoscience data journal
• Subject repositories e.g. RePec, ArXiv
• Figshare, Dryad
• UC Research Repository
29. Case Study: CEISMIC Canterbury
Earthquakes Digital Archive
Enabling effective data
management and reuse:
• Discoverability
• Ethics
• Licensing
• Technical
30. Discoverability
- Submit to your IR
- Use unique identifiers or URIs
- Provide metadata – you are the
best source
Ethics
- Identify data of long-term value
- Consent forms should cover:
- Storage & access
- How data can be reused
Licensing
- Use NZ CC licenses for data
- Consider how ethics requirements
affect licensing
Technical
- Use ‘open’ formats, eg CSV
- Consider standards, eg
http://dataprotocols.org/tabular-
data-package/
31. Why you should manage your data
Compliance with funders’& institutional policies
Reduces the risk of data loss
Facilitates sharing and reuse of data
Enhances the visibility of your research
Provides opportunities for collaborations
33. ITS support
Virtual machines -
Windows (currently
Windows 12 server)
Linux (Red Hat
Enterprise)
Bandwidth quota per
month 20gb
(40gb for international
students)
KAREN network from
REANNZ
Storage and further
resources on request
34. More help
RDM Subject guide
Anton Angelo
Research Data Coordinator
Liaison Librarians:
Kerry Gilmour
Dave Lane
Janette Nicolle
Cuiying Mu
Departmental IT Technicians
Peter Lund,
Manager, Research Support
35. Importance of data management
plans
Credit: Mantra –
University of
Edinburgh
36. Photo credits
taken from Flickr and used with attribution under cc licence
• Slide 1 Janeneka Staaks
• Slide 9 Caroline and Louis Volant
• Slide 10 Global Panorama
Editor's Notes
Good morning
I’ll start by making a few preliminary remarks about research data management at UC
There is currently no research data management policy within the University of Canterbury and researchers are not required to complete data management plans
But for many areas of research UC researchers have to complete an ethics approval for their research which will have some overlap with a data management plan.
Only a small number of UC PhD students become academic staff at the University of Canterbury. Many more will seek academic appointments elsewhere. We are teaching researchers transferrable skills to help them find jobs doing research in a global marketplace. This means teaching you best practice.
Researcher staff also need to obtain grants to fund their research and we think that increasingly grant awarding bodies will favour bids which have thought about data.
By the end of this workshop participants will be able to:
University of Minnesota: In the Reference Model for an Open Archival Information System (OAIS) (Wikipedia), data is defined as "[a] reinterpretable representation of information in a formalized manner suitable for communication, interpretation, or processing. Examples of data include a sequence of bits, a table of numbers, the characters on a page, the recording of sounds made by a person speaking, or a moon rock specimen."
Types of data include:
observational data
laboratory experimental data
computer simulation
textual analysis
physical artifacts or relics
For social science, data is generally numeric files originating from social research methodologies or administrative records, from which statistics are produced. It also includes, however, more data formats such as audio, video, geospatial and other digital content that are germane to social science research.
Digital text is becoming increasingly important in the humanities and arts. Research in these areas may think of data in the form of textual information, semantic elements, and text objects. Digital Arts, Sciences, and Humanities (DASH), on campus, is an example of research emerging in this area.
STM publishers recognised 3 technological trends in scientific publishing in the next 3-5 years.
1. The first is the emergence of Data as a First Class Research Object. For those unclear as to the significance of that phrase, such objects are key to ensuring the ongoing reproducibility and reusability of scientific research material.
2. A related trend pertains to the emerging importance of Reputation Management.
3. the scholarly article as a crucial element in a hub and spoke model encompassing a variety of non-textual forms of content (video, data, software methods, other media, etc.). Those elements will ultimately be packaged, presented, and preserved in a smart network of connections that more effectively meet the needs of specific communities.
Marsden isn’t the only one:
LandCare Subcontract (within an MBIE-LandCare project).
This is quite complex:
“Subject to the restrictions in paragraph 29 of the New Zealand Government Open Access and Licensing framework (NZGOAL) , which are specified in clause 8.4, the subcontractor will license all copyright works produced under this subcontract (excluding data that identifies an individual or an individual farm) including any reports, data, information, outputs, computer programme source code, other materials, and all intellectual property in the Deliverables on a Creative Commons Attribution 3.0 New Zealand licence.”
Research Funders’ data policies set expectations for the management and public availability of research data. RCUK’s seven common principles in brief are
Australian National Data Service – “building a cohesive collection of research resources from all research institutions to make better use of Australia’s research data outputs”
As you know the research process is a life cycle
In the US data is now considered an asset. Preservation and reuse of data is being built into the research life cycle. Good preservation allows for re-use and repurposing of the data.
Difficult to under estimate the importance of preserving some data – think about accessing the Large Hadron Collider in CERN – difficult to go back and repeat the experiment!
Http://www.bath.ac.uk/research/data/planning
Digital Curation Centre provide a checklist to help you create a data management plan
This is about File version control
Keeping track of versions of documents and datasets is critical. Strategies include:
Directory top-level folder should include the project title, unique identifier, and date.
The substructure should have a clear and documented naming convention, such as numbering or naming the experiment runs, dataset versions, and/or researchers. Reserve the 3-letter file extension for application-specific codes, for example, formats like .wrl, .mov, and .tif.
Identify the activity or project in the file name
Technology changes - plan for both hardware and software obsolescence
Handout on common file formats
File formats more likely to be accessible in the future have the following characteristics:
Example file formats
ASCII, not Excel
MPEG-4, not Quicktime
TIFF or JPEG2000, not GIF or JPG
XML or RDF, not RDBMS
If you deposit your data in a repository, your files may be migrated to newer formats, so that they’re usable to future researcher
In Australia…
The National Health and Medical Research Council (NHMRC) released a significant statement on data sharing:
"...NHMRC acknowledges the importance of making data publicly accessible. NHMRC encourages data sharing and providing access to data and other research outputs (metadata, analysis code, study protocols, study materials and other collected data) arising from NHMRC supported research...."
The statement also provides detail on how to share health and medical data, when to plan, and pointers to frameworks and standards on data quality and accessibility.
The full statement is at: http://www.nhmrc.gov.au/grants-funding/policy/nhmrc-statement-data-sharing.
cheers,
SNAFU is a military slang acronym meaning "Situation Normal: All Fucked/Fouled Up."
How would you describe your data to make it discoverable?
What elements do you think are important to ensure your data can be found?
Having a data management plan will save you time in the long run and will help you communicate to funders.