Long-term storage – will it fill up with 
the good stuff, or the big, bad, and ugly? 
Can checklists make a difference? 
Angus Whyte, DCC 
‘Research Data Storage and Preservation Strategies’ 
University of Edinburgh 27 October 2014 
a.whyte@ed.ac.uk
Long-term storage – will it fill up with the 
good stuff, or just the big, bad, and ugly? 
Will checklists encourage researchers to decide?
RDM Service Components 
www.dcc.ac.uk/resources/how-guides/how-develop-rdm-services
RDM Service ‘Components’ 
www.dcc.ac.uk/resources/how-guides/how-develop-rdm-services
But more support needed! 
Top 3 support needs for institutions * 
1. Defining what to retain 
2. Specifying tools/ infrastructure 
3. Supporting metadata creation for 
research data discovery 
*March 2014 DCC 2014 RDM Survey of 61 institutions 
Data available at: zenodo.org/collection/user-dcc-rdm-2014
Data Asset Surveys 
Some institutions have estimated storage requirements from these 
About your data and 
its lifecycle…? 
1.File type 
2.Volumes 
3.Density 
4.Update frequence 
5.Usage frequency 
6.Availability req’d 
7.Sensitivity 
Active storage 
Archival storage 
Data Asset Framework Implementation guide 
www.data-audit.eu/docs/DAF_Implementation_Guide.pdf
Data Asset Surveys 
Some institutions have estimated storage requirements from these 
About your data and 
its lifecycle…? 
1.File type 
2.Volumes 
3.Density 
4.Update frequence 
5.Usage frequency 
6.Availability req’d 
7.Sensitivity 
Active storage 
Archival storage 
But if you provide it will researchers use it, at what cost? 
Data Asset Framework Implementation guide 
www.data-audit.eu/docs/DAF_Implementation_Guide.pdf
Practical checklists 
key points in research cycle 
Data Mgmt Plan 
1. Collection 
2. Documentation 
3. Ethics & legal 
4. Storage & backup 
5. Selection& preserve 
6. Data sharing 
7. Responsibilities 
Repository 
selection 
1. Policy & legal 
2. Discoverable 
3. Preservation 
4. Reports 
5. Trust 
Archival storage Active storage 
Data Selection 
5 Steps to decide what 
to keep 
1. Could - benefit 
2. Must - risks 
3. Should - value 
4. Cost factors 
5. Weigh-up 1-4 
Catalogue 
Metadata 
1. Name 
2. Description 
3. Identifier 
4. Subject 
5. URL 
6. Date 
7. Creator 
8. Rights 
9. Spatial 
10.Publisher 
Start 
Write-up
Data selection checklist 
Preview at: this Google doc
11 
Data selection checklist 
Straightforward steps to guide researchers 
①Could this data be re-used 
②Must it be kept to manage compliance risk 
③Should it be kept for its potential value and… 
④Considering costs 
⑤Will ✔or won’t ✗ it be kept, shared on what terms 
Institution or 
external 
repository 
Data Selection 
5 Steps to decide what 
to keep 
1. Could - benefit 
2. Must - risks 
3. Should - value 
4. Cost factors 
5. Weigh-up 1-4 
Repository 
selection 
1. Policy & legal 
2. Discoverable 
3. Preservation 
4. Reports 
5. Trust
12 
Step 1 (?) What ‘must’ be kept? 
Research record includes data as evidence for e.g. … 
• Audit purposes 
• Health & Safety (Lab book) 
• Contractual requirement 
Compliance also about data that won’t be kept, or 
may only be shared with approved researchers… 
Research Ethics, Duty of Confidentiality, Data Protection Act, Human Rights Act, Statistics & 
Registration Services Act. UK Data Archive: 
http://www.data-archive.ac.uk/create-manage/consent-ethics/legal 
Jisc Infonet Guidance on Managing Research Records 
tools.jiscinfonet.ac.uk/downloads/bcs-rrs/managing-research-records.pdf
13 
Step 1 (?) What ‘must’ be kept? 
Research record includes data as evidence for e.g. … 
• Audit purposes 
• Health & Safety (Lab book) 
• Contractual requirement 
Compliance also about data that won’t be kept, or 
may only be shared with approved researchers… 
Research Ethics, Duty of Confidentiality, Data Protection Act, Human Rights Act, Statistics & 
Registration Services Act. UK Data Archive: 
http://www.data-archive.ac.uk/create-manage/consent-ethics/legal 
Available choices depend on what purposes the data serves 
Jisc Infonet Guidance on Managing Research Records 
tools.jiscinfonet.ac.uk/downloads/bcs-rrs/managing-research-records.pdf
14 
Step 1 (?) What ‘must’ be kept? 
But what about funder & journal data policies? 
“Data with acknowledged long-term value ” 
RCUK Common Principles on Data Policy 
“Data, information and other electronic resources of long-term interest” 
ESRC UK Data Archive Collections Development Policy 
“Where data underpins published research there is much greater 
expectation that it will be kept” 
Ben Ryan, EPSRC 
“An inherent principle of publication is that others should be able to 
replicate and build upon the authors' published claims. Nature
15 
Step 1 (?) What ‘must’ be kept? 
But what about funder & journal data policies? 
“Data with acknowledged long-term value ” 
RCUK Common Principles on Data Policy 
“Data, information and other electronic resources of long-term interest” 
ESRC UK Data Archive Collections Development Policy 
“Where data underpins published research there is much greater 
expectation that it will be kept” 
Ben Ryan, EPSRC 
“An inherent principle of publication is that others should be able to 
replicate and build upon the authors' published claims. Nature 
Still researchers’ judgement- what purposes the data may serve
Still researchers’ judgement- what purposes the data may serve 
16 
Step 1 (?) What ‘must’ be kept? 
But what about funder & journal data policies? 
“Data with acknowledged long-term value ” 
RCUK Common Principles on Data Policy 
“Data, information and other electronic resources of long-term interest” 
ESRC UK Data Archive Collections Development Policy 
“Where data underpins published research there is much greater 
expectation that it will be kept” 
Ben Ryan, EPSRC 
“An inherent principle of publication is that others should be able to 
replicate and build upon the authors' published claims. Nature 
So make thinking about that the first step
Step 2 1 What could it be reused for? 
17 
Any angles the researcher has not already considered? 
1. Verification 
2. Further analysis 
3. Reputation building 
4. Resource development 
5. Further publications inc. data articles 
6. Learning and teaching materials 
7. Private reference
Step 2 1 What could it be reused for? 
18 
Any angles the researcher has not already considered? 
1. Verification 
2. Further analysis 
3. Reputation building 
4. Resource development 
5. Further publications inc. data articles 
6. Learning and teaching materials 
7. Private reference 
Then, relative to these, which data must be kept
Step 3 What data should have value 
19 
Any two of these fit? 
1. Good quality data and description 
complete, accurate, reliable, valid, representative etc 
2. High demand 
known users, integration potential, reputation, recommendation, appeal 
3. High effort to replicate 
difficult, costly, or impossible to reproduce 
4. Low barriers to reuse 
legal/ ethical, copyright non-restrictive terms and conditions 
5. Rarity value 
unique copy or other copies at risk 
Then what else e.g. software does it depend on?
Step 4 Cost factors 
20 
Why? 
• Costs incurred during project may add to value 
• Post-project costs must be covered 
1. Creation, collection & cleaning 
2. Short-term storage & backup 
3. Short-term access & security 
4. Team communication & development 
5. Preservation & long-term access 
So what action needed to ensure on budget?
Step 5 Bring it all together 
21 
Balance risks, costs and value 
Document the choices made 
1. Name, contributors, description, sensitivity - metadata 
2. Reuse purposes and value – the ‘reuse case’ 
3. Risk of non-compliance and costs shortfall 
4. Justification to keep or dispose 
5. Actions to prepare for preservation or disposal
But will this work 
From research perspective will active selection mean bureacracy? 
Data Mgmt Plan 
1. Collection 
2. Documentation 
3. Ethics & legal 
4. Storage & backup 
5. Selection& preserve 
6. Data sharing 
7. Responsibilities 
Repository 
selection 
1. Policy & legal 
2. Discoverable 
3. Preservation 
4. Reports 
5. Trust 
Archival storage Active storage 
Data Selection 
5 Steps to decide what 
to keep 
1. Could - benefit 
2. Must - risks 
3. Should - value 
4. Cost factors 
5. Weigh-up 1-4 
Catalogue 
Metadata 
1. Name 
2. Description 
3. Identifier 
4. Subject 
5. URL 
6. Date 
7. Creator 
8. Rights 
9. Spatial 
10.Publisher
But will it work 
Easier to avoid selecting the good and let someone else deal with de-allocation? 
Data Mgmt Plan 
- enough to 
identify which 
project this data 
relates to 
The ugly 
“dont know 
its value or 
where else to 
put it” 
Archival storage Active storage 
“The bad” 
Can’t share as 
nobody knows 
its sensitivity 
The “too 
big for 
anywhere 
else”
Thank you

Long-term storage – will it fill up with the good stuff, or the big, bad, and ugly? Can checklists make a difference?

  • 1.
    Long-term storage –will it fill up with the good stuff, or the big, bad, and ugly? Can checklists make a difference? Angus Whyte, DCC ‘Research Data Storage and Preservation Strategies’ University of Edinburgh 27 October 2014 a.whyte@ed.ac.uk
  • 2.
    Long-term storage –will it fill up with the good stuff, or just the big, bad, and ugly? Will checklists encourage researchers to decide?
  • 4.
    RDM Service Components www.dcc.ac.uk/resources/how-guides/how-develop-rdm-services
  • 5.
    RDM Service ‘Components’ www.dcc.ac.uk/resources/how-guides/how-develop-rdm-services
  • 6.
    But more supportneeded! Top 3 support needs for institutions * 1. Defining what to retain 2. Specifying tools/ infrastructure 3. Supporting metadata creation for research data discovery *March 2014 DCC 2014 RDM Survey of 61 institutions Data available at: zenodo.org/collection/user-dcc-rdm-2014
  • 7.
    Data Asset Surveys Some institutions have estimated storage requirements from these About your data and its lifecycle…? 1.File type 2.Volumes 3.Density 4.Update frequence 5.Usage frequency 6.Availability req’d 7.Sensitivity Active storage Archival storage Data Asset Framework Implementation guide www.data-audit.eu/docs/DAF_Implementation_Guide.pdf
  • 8.
    Data Asset Surveys Some institutions have estimated storage requirements from these About your data and its lifecycle…? 1.File type 2.Volumes 3.Density 4.Update frequence 5.Usage frequency 6.Availability req’d 7.Sensitivity Active storage Archival storage But if you provide it will researchers use it, at what cost? Data Asset Framework Implementation guide www.data-audit.eu/docs/DAF_Implementation_Guide.pdf
  • 9.
    Practical checklists keypoints in research cycle Data Mgmt Plan 1. Collection 2. Documentation 3. Ethics & legal 4. Storage & backup 5. Selection& preserve 6. Data sharing 7. Responsibilities Repository selection 1. Policy & legal 2. Discoverable 3. Preservation 4. Reports 5. Trust Archival storage Active storage Data Selection 5 Steps to decide what to keep 1. Could - benefit 2. Must - risks 3. Should - value 4. Cost factors 5. Weigh-up 1-4 Catalogue Metadata 1. Name 2. Description 3. Identifier 4. Subject 5. URL 6. Date 7. Creator 8. Rights 9. Spatial 10.Publisher Start Write-up
  • 10.
    Data selection checklist Preview at: this Google doc
  • 11.
    11 Data selectionchecklist Straightforward steps to guide researchers ①Could this data be re-used ②Must it be kept to manage compliance risk ③Should it be kept for its potential value and… ④Considering costs ⑤Will ✔or won’t ✗ it be kept, shared on what terms Institution or external repository Data Selection 5 Steps to decide what to keep 1. Could - benefit 2. Must - risks 3. Should - value 4. Cost factors 5. Weigh-up 1-4 Repository selection 1. Policy & legal 2. Discoverable 3. Preservation 4. Reports 5. Trust
  • 12.
    12 Step 1(?) What ‘must’ be kept? Research record includes data as evidence for e.g. … • Audit purposes • Health & Safety (Lab book) • Contractual requirement Compliance also about data that won’t be kept, or may only be shared with approved researchers… Research Ethics, Duty of Confidentiality, Data Protection Act, Human Rights Act, Statistics & Registration Services Act. UK Data Archive: http://www.data-archive.ac.uk/create-manage/consent-ethics/legal Jisc Infonet Guidance on Managing Research Records tools.jiscinfonet.ac.uk/downloads/bcs-rrs/managing-research-records.pdf
  • 13.
    13 Step 1(?) What ‘must’ be kept? Research record includes data as evidence for e.g. … • Audit purposes • Health & Safety (Lab book) • Contractual requirement Compliance also about data that won’t be kept, or may only be shared with approved researchers… Research Ethics, Duty of Confidentiality, Data Protection Act, Human Rights Act, Statistics & Registration Services Act. UK Data Archive: http://www.data-archive.ac.uk/create-manage/consent-ethics/legal Available choices depend on what purposes the data serves Jisc Infonet Guidance on Managing Research Records tools.jiscinfonet.ac.uk/downloads/bcs-rrs/managing-research-records.pdf
  • 14.
    14 Step 1(?) What ‘must’ be kept? But what about funder & journal data policies? “Data with acknowledged long-term value ” RCUK Common Principles on Data Policy “Data, information and other electronic resources of long-term interest” ESRC UK Data Archive Collections Development Policy “Where data underpins published research there is much greater expectation that it will be kept” Ben Ryan, EPSRC “An inherent principle of publication is that others should be able to replicate and build upon the authors' published claims. Nature
  • 15.
    15 Step 1(?) What ‘must’ be kept? But what about funder & journal data policies? “Data with acknowledged long-term value ” RCUK Common Principles on Data Policy “Data, information and other electronic resources of long-term interest” ESRC UK Data Archive Collections Development Policy “Where data underpins published research there is much greater expectation that it will be kept” Ben Ryan, EPSRC “An inherent principle of publication is that others should be able to replicate and build upon the authors' published claims. Nature Still researchers’ judgement- what purposes the data may serve
  • 16.
    Still researchers’ judgement-what purposes the data may serve 16 Step 1 (?) What ‘must’ be kept? But what about funder & journal data policies? “Data with acknowledged long-term value ” RCUK Common Principles on Data Policy “Data, information and other electronic resources of long-term interest” ESRC UK Data Archive Collections Development Policy “Where data underpins published research there is much greater expectation that it will be kept” Ben Ryan, EPSRC “An inherent principle of publication is that others should be able to replicate and build upon the authors' published claims. Nature So make thinking about that the first step
  • 17.
    Step 2 1What could it be reused for? 17 Any angles the researcher has not already considered? 1. Verification 2. Further analysis 3. Reputation building 4. Resource development 5. Further publications inc. data articles 6. Learning and teaching materials 7. Private reference
  • 18.
    Step 2 1What could it be reused for? 18 Any angles the researcher has not already considered? 1. Verification 2. Further analysis 3. Reputation building 4. Resource development 5. Further publications inc. data articles 6. Learning and teaching materials 7. Private reference Then, relative to these, which data must be kept
  • 19.
    Step 3 Whatdata should have value 19 Any two of these fit? 1. Good quality data and description complete, accurate, reliable, valid, representative etc 2. High demand known users, integration potential, reputation, recommendation, appeal 3. High effort to replicate difficult, costly, or impossible to reproduce 4. Low barriers to reuse legal/ ethical, copyright non-restrictive terms and conditions 5. Rarity value unique copy or other copies at risk Then what else e.g. software does it depend on?
  • 20.
    Step 4 Costfactors 20 Why? • Costs incurred during project may add to value • Post-project costs must be covered 1. Creation, collection & cleaning 2. Short-term storage & backup 3. Short-term access & security 4. Team communication & development 5. Preservation & long-term access So what action needed to ensure on budget?
  • 21.
    Step 5 Bringit all together 21 Balance risks, costs and value Document the choices made 1. Name, contributors, description, sensitivity - metadata 2. Reuse purposes and value – the ‘reuse case’ 3. Risk of non-compliance and costs shortfall 4. Justification to keep or dispose 5. Actions to prepare for preservation or disposal
  • 22.
    But will thiswork From research perspective will active selection mean bureacracy? Data Mgmt Plan 1. Collection 2. Documentation 3. Ethics & legal 4. Storage & backup 5. Selection& preserve 6. Data sharing 7. Responsibilities Repository selection 1. Policy & legal 2. Discoverable 3. Preservation 4. Reports 5. Trust Archival storage Active storage Data Selection 5 Steps to decide what to keep 1. Could - benefit 2. Must - risks 3. Should - value 4. Cost factors 5. Weigh-up 1-4 Catalogue Metadata 1. Name 2. Description 3. Identifier 4. Subject 5. URL 6. Date 7. Creator 8. Rights 9. Spatial 10.Publisher
  • 23.
    But will itwork Easier to avoid selecting the good and let someone else deal with de-allocation? Data Mgmt Plan - enough to identify which project this data relates to The ugly “dont know its value or where else to put it” Archival storage Active storage “The bad” Can’t share as nobody knows its sensitivity The “too big for anywhere else”
  • 24.