SlideShare a Scribd company logo
Data Selection & Triage



     JISC/DCC
      Progress
    Workshop
    Managing
Research Data
& Institutional
  Engagement
  Nottingham
   25 October
          2012



                  This work is licensed under a Creative Commons Attribution 2.5 UK: Scotland License
Introduction

How can researchers and support staff
effectively decide what data is worth holding
on to, agree what to do with it, and arrange
for its handover?
What challenges does this represent
How to address them?
Outline

• What guidelines are there and why do we need
  more?Angus Whyte DCC and Marie Therese
  Gramstadt - KAPTUR
• UK Data Archive's Data Review Process - Veerle
  van Eynden UKDA
• Applying NERC's Data Value Checklist - Sam
  Pepler, British Atmospheric Data Centre
• Discussion
Guidelines clarify expectations
                           …adapted by
                           Archaeology Data Service
                           NERC
                           KAPTUR
                           University of Leicester


        What criteria
         will be used to
         judge what’s
         handed over?
Basic model

1. Define a policy i.e. criteria
   and range of decisions                  All
2. Archive manager applies                data
   criteria, involving researchers
3. Select the significant,
   dispose of the rest               10
                                     %
For records records yes, but
   researchdata?                           90%
Characterising research data…
• Research process more uncertain and open-ended
  than admin processes
• Research data purpose may change before complete
• More effort to make reusable - complex inter-
  relationships, and richer contexts to document
• Originators should be engaged but may not have
  capacity e.g. if project funding has ceased
• Others may need to be involved with broader view of
  potential in other disciplines
• More than keep/dispose choice –need to prioritise
  attention and effort to make data fit for reuse
Triage analogy
     First                                     Deposit location
 characterise
research data                                  Institutional Data
                             Prioritise        Repository
      Criteria            High reuse value +   Data Centre
                          needs attention
Duty of care              affordable           Subject Repository etc.

Reuse value               Other
                          permutations         Tiered approach to
Quality and                                    deploying resources
                          More permutations
condition                                      Discoverability
Accessibility             Low reuse value,
                          Unaffordable         Access management
Costs associated
                                               Storage performance
         Potential to automate ?
                                               Preservation actions
Clarify expectations



        What kinds of
         “data” are
         wanted
          For what kinds
          of reuse
e.g.Data Centre Collection Policies

                “The ADS expects to
                  collect all of the
                  following
                  archaeological data
                  types…”




          http://archaeologydataservice.ac.uk/advice/collectionsPolicy

                                                                   9
Costs should persuade us
 IDC Digital Universe Study- Increasing volumes outpace declining storage
 hardware costs




According to: John Gantz and David Reinsel 2011 Extracting Value from Chaos
http://www.emc.com/digital_universe.

                                                                              10
We can’t afford it all
                                       “Keeping 2018’s data in S3 would
                                       cost the entire global GDP”




http://blog.dshr.org/2012/05/lets-just-keep-everything-forever-in.html

                                                                          11
Selection presumes description

• You can’t value what you don’t know about!
• Researchers can’t afford NOT to spend effort
  on minimal metadata description and
  organisation, because costs of retention will
  be much higher if they don’t
• Description makes data affordable – is citation
  potential a concrete enough reward?


                                                  12
Challenges

• Identify what datasets are created
  and where they are
• Differentiate those that are of high
  value from those where most
  uncertainty or least reusability
• Be able to justify ‘natural’ wastage
  of low priority data as much as
  deliberate selection of high value
Questions
• What has worked/is working
• What lessons have you learned and
  how generalisable
• What challenges remain
• How may they be approached and
  what do you intend to do
• What DCC / MRD activity do you
  think may help make the challenge
  more tractable.

More Related Content

What's hot

Digital Curation 101 - Taster
Digital Curation 101 - TasterDigital Curation 101 - Taster
Digital Curation 101 - Taster
Digital Curation Centre (DCC)
 
Research bites: Digital Preservation for Research Data
Research bites: Digital Preservation for Research DataResearch bites: Digital Preservation for Research Data
Research bites: Digital Preservation for Research Data
Lancaster University Library
 
Facing the data challenge: Developing data policy and services
Facing the data challenge: Developing data policy and servicesFacing the data challenge: Developing data policy and services
Facing the data challenge: Developing data policy and services
Marieke Guy
 
Planning for Research Data Managment
Planning for Research Data ManagmentPlanning for Research Data Managment
Planning for Research Data Managment
Daniel Crane
 
Developing an institutional research management plan: guidelines
Developing an institutional research management plan: guidelinesDeveloping an institutional research management plan: guidelines
Developing an institutional research management plan: guidelines
heila1
 
Managing your data paget
Managing your data pagetManaging your data paget
Managing your data paget
TERN Australia
 
Introduction to RDM for Geoscience PhD Students
Introduction to RDM for Geoscience PhD StudentsIntroduction to RDM for Geoscience PhD Students
Introduction to RDM for Geoscience PhD Students
EDINA, University of Edinburgh
 
Long-term storage – will it fill up with the good stuff, or the big, bad, an...
Long-term storage – will it fill up with the good stuff, or the big, bad, an...Long-term storage – will it fill up with the good stuff, or the big, bad, an...
Long-term storage – will it fill up with the good stuff, or the big, bad, an...
DCC-info
 
Repository Federation: Towards Data Interoperability
Repository Federation: Towards Data InteroperabilityRepository Federation: Towards Data Interoperability
Repository Federation: Towards Data Interoperability
Robert H. McDonald
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science
Robert H. McDonald
 
SEAD Virtual Archive: Building a Federation of Institutional Repositories fo...
 SEAD Virtual Archive: Building a Federation of Institutional Repositories fo... SEAD Virtual Archive: Building a Federation of Institutional Repositories fo...
SEAD Virtual Archive: Building a Federation of Institutional Repositories fo...
skonkiel
 
Writing successful data management plans
Writing successful data management plansWriting successful data management plans
Writing successful data management plans
IzzyChad
 

What's hot (12)

Digital Curation 101 - Taster
Digital Curation 101 - TasterDigital Curation 101 - Taster
Digital Curation 101 - Taster
 
Research bites: Digital Preservation for Research Data
Research bites: Digital Preservation for Research DataResearch bites: Digital Preservation for Research Data
Research bites: Digital Preservation for Research Data
 
Facing the data challenge: Developing data policy and services
Facing the data challenge: Developing data policy and servicesFacing the data challenge: Developing data policy and services
Facing the data challenge: Developing data policy and services
 
Planning for Research Data Managment
Planning for Research Data ManagmentPlanning for Research Data Managment
Planning for Research Data Managment
 
Developing an institutional research management plan: guidelines
Developing an institutional research management plan: guidelinesDeveloping an institutional research management plan: guidelines
Developing an institutional research management plan: guidelines
 
Managing your data paget
Managing your data pagetManaging your data paget
Managing your data paget
 
Introduction to RDM for Geoscience PhD Students
Introduction to RDM for Geoscience PhD StudentsIntroduction to RDM for Geoscience PhD Students
Introduction to RDM for Geoscience PhD Students
 
Long-term storage – will it fill up with the good stuff, or the big, bad, an...
Long-term storage – will it fill up with the good stuff, or the big, bad, an...Long-term storage – will it fill up with the good stuff, or the big, bad, an...
Long-term storage – will it fill up with the good stuff, or the big, bad, an...
 
Repository Federation: Towards Data Interoperability
Repository Federation: Towards Data InteroperabilityRepository Federation: Towards Data Interoperability
Repository Federation: Towards Data Interoperability
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science
 
SEAD Virtual Archive: Building a Federation of Institutional Repositories fo...
 SEAD Virtual Archive: Building a Federation of Institutional Repositories fo... SEAD Virtual Archive: Building a Federation of Institutional Repositories fo...
SEAD Virtual Archive: Building a Federation of Institutional Repositories fo...
 
Writing successful data management plans
Writing successful data management plansWriting successful data management plans
Writing successful data management plans
 

Similar to Data Selection & Triage

Challenges in setting up an RDM Support Service
Challenges in setting up an RDM Support ServiceChallenges in setting up an RDM Support Service
Challenges in setting up an RDM Support Service
GarethKnight
 
Graham Pryor
Graham PryorGraham Pryor
Graham Pryor
Eduserv
 
Wheeler & Benedict -- Enabling the Preservation Relay
Wheeler & Benedict -- Enabling the Preservation RelayWheeler & Benedict -- Enabling the Preservation Relay
Wheeler & Benedict -- Enabling the Preservation Relay
National Information Standards Organization (NISO)
 
Managing data throughout the research lifecycle
Managing data throughout the research lifecycleManaging data throughout the research lifecycle
Managing data throughout the research lifecycle
Marieke Guy
 
Introduction to research data management
Introduction to research data managementIntroduction to research data management
Introduction to research data management
opl10
 
Planning for Research Data Management
Planning for Research Data ManagementPlanning for Research Data Management
Planning for Research Data Management
dancrane_open
 
What is-rdm
What is-rdmWhat is-rdm
What is-rdm
Sarah Jones
 
Building Sustainability: Preserving research data without breaking the bank
Building Sustainability: Preserving research data without breaking the bankBuilding Sustainability: Preserving research data without breaking the bank
Building Sustainability: Preserving research data without breaking the bank
GarethKnight
 
Gareth Knight: Building sustainability: Preserving research data without brea...
Gareth Knight: Building sustainability: Preserving research data without brea...Gareth Knight: Building sustainability: Preserving research data without brea...
Gareth Knight: Building sustainability: Preserving research data without brea...
TDBaldwin
 
Why manage research data?
Why manage research data?Why manage research data?
Why manage research data?
Graham Pryor
 
Practical Research Data Management: tools and approaches, pre- and post-award
Practical Research Data Management:  tools and approaches, pre- and post-awardPractical Research Data Management:  tools and approaches, pre- and post-award
Practical Research Data Management: tools and approaches, pre- and post-award
Martin Donnelly
 
Supporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data ManagementSupporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data Management
Marieke Guy
 
Supporting Research Data Management at the University of Stirling
Supporting Research Data Management at the University of StirlingSupporting Research Data Management at the University of Stirling
Supporting Research Data Management at the University of StirlingLisa Haddow
 
Managing and Sharing Research Data
Managing and Sharing Research DataManaging and Sharing Research Data
Managing and Sharing Research Data
Martin Donnelly
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data Locally
Erin D. Foster
 
Improving RDM through closer integration of electronic lab notebooks and data...
Improving RDM through closer integration of electronic lab notebooks and data...Improving RDM through closer integration of electronic lab notebooks and data...
Improving RDM through closer integration of electronic lab notebooks and data...
rmacneil88
 
Research Data Management Storage Requirements: University of Leeds
Research Data Management Storage Requirements: University of LeedsResearch Data Management Storage Requirements: University of Leeds
Research Data Management Storage Requirements: University of Leeds
Research Data Leeds
 
Facing the Data Challenge: Institutions, Disciplines, Services and Risks
Facing the Data Challenge: Institutions, Disciplines, Services and RisksFacing the Data Challenge: Institutions, Disciplines, Services and Risks
Facing the Data Challenge: Institutions, Disciplines, Services and Risks
LizLyon
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Denodo
 
Research Data Mangagement Essentials, 5th July 2017
Research Data Mangagement Essentials, 5th July 2017Research Data Mangagement Essentials, 5th July 2017
Research Data Mangagement Essentials, 5th July 2017
Research Data Leeds
 

Similar to Data Selection & Triage (20)

Challenges in setting up an RDM Support Service
Challenges in setting up an RDM Support ServiceChallenges in setting up an RDM Support Service
Challenges in setting up an RDM Support Service
 
Graham Pryor
Graham PryorGraham Pryor
Graham Pryor
 
Wheeler & Benedict -- Enabling the Preservation Relay
Wheeler & Benedict -- Enabling the Preservation RelayWheeler & Benedict -- Enabling the Preservation Relay
Wheeler & Benedict -- Enabling the Preservation Relay
 
Managing data throughout the research lifecycle
Managing data throughout the research lifecycleManaging data throughout the research lifecycle
Managing data throughout the research lifecycle
 
Introduction to research data management
Introduction to research data managementIntroduction to research data management
Introduction to research data management
 
Planning for Research Data Management
Planning for Research Data ManagementPlanning for Research Data Management
Planning for Research Data Management
 
What is-rdm
What is-rdmWhat is-rdm
What is-rdm
 
Building Sustainability: Preserving research data without breaking the bank
Building Sustainability: Preserving research data without breaking the bankBuilding Sustainability: Preserving research data without breaking the bank
Building Sustainability: Preserving research data without breaking the bank
 
Gareth Knight: Building sustainability: Preserving research data without brea...
Gareth Knight: Building sustainability: Preserving research data without brea...Gareth Knight: Building sustainability: Preserving research data without brea...
Gareth Knight: Building sustainability: Preserving research data without brea...
 
Why manage research data?
Why manage research data?Why manage research data?
Why manage research data?
 
Practical Research Data Management: tools and approaches, pre- and post-award
Practical Research Data Management:  tools and approaches, pre- and post-awardPractical Research Data Management:  tools and approaches, pre- and post-award
Practical Research Data Management: tools and approaches, pre- and post-award
 
Supporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data ManagementSupporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data Management
 
Supporting Research Data Management at the University of Stirling
Supporting Research Data Management at the University of StirlingSupporting Research Data Management at the University of Stirling
Supporting Research Data Management at the University of Stirling
 
Managing and Sharing Research Data
Managing and Sharing Research DataManaging and Sharing Research Data
Managing and Sharing Research Data
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data Locally
 
Improving RDM through closer integration of electronic lab notebooks and data...
Improving RDM through closer integration of electronic lab notebooks and data...Improving RDM through closer integration of electronic lab notebooks and data...
Improving RDM through closer integration of electronic lab notebooks and data...
 
Research Data Management Storage Requirements: University of Leeds
Research Data Management Storage Requirements: University of LeedsResearch Data Management Storage Requirements: University of Leeds
Research Data Management Storage Requirements: University of Leeds
 
Facing the Data Challenge: Institutions, Disciplines, Services and Risks
Facing the Data Challenge: Institutions, Disciplines, Services and RisksFacing the Data Challenge: Institutions, Disciplines, Services and Risks
Facing the Data Challenge: Institutions, Disciplines, Services and Risks
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
 
Research Data Mangagement Essentials, 5th July 2017
Research Data Mangagement Essentials, 5th July 2017Research Data Mangagement Essentials, 5th July 2017
Research Data Mangagement Essentials, 5th July 2017
 

More from The University of Edinburgh

Paving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflowsPaving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflows
The University of Edinburgh
 
Lhstm whyte readiness_slides
Lhstm whyte readiness_slidesLhstm whyte readiness_slides
Lhstm whyte readiness_slides
The University of Edinburgh
 
Institutional Support for Research Data Management- Why, what and where next?...
Institutional Support for Research Data Management- Why, what and where next?...Institutional Support for Research Data Management- Why, what and where next?...
Institutional Support for Research Data Management- Why, what and where next?...
The University of Edinburgh
 
OR2013 workshop "Institutional Repositories Dealing with Data " DCC Introduction
OR2013 workshop "Institutional Repositories Dealing with Data " DCC IntroductionOR2013 workshop "Institutional Repositories Dealing with Data " DCC Introduction
OR2013 workshop "Institutional Repositories Dealing with Data " DCC Introduction
The University of Edinburgh
 
How will repository and subject librarians roles interact to support data man...
How will repository and subject librarians roles interact to support data man...How will repository and subject librarians roles interact to support data man...
How will repository and subject librarians roles interact to support data man...
The University of Edinburgh
 
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
The University of Edinburgh
 
Introduction to Research Data Management
Introduction to Research Data ManagementIntroduction to Research Data Management
Introduction to Research Data Management
The University of Edinburgh
 
Reasons to select research data and where to start
Reasons to select research data and where to startReasons to select research data and where to start
Reasons to select research data and where to start
The University of Edinburgh
 

More from The University of Edinburgh (8)

Paving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflowsPaving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflows
 
Lhstm whyte readiness_slides
Lhstm whyte readiness_slidesLhstm whyte readiness_slides
Lhstm whyte readiness_slides
 
Institutional Support for Research Data Management- Why, what and where next?...
Institutional Support for Research Data Management- Why, what and where next?...Institutional Support for Research Data Management- Why, what and where next?...
Institutional Support for Research Data Management- Why, what and where next?...
 
OR2013 workshop "Institutional Repositories Dealing with Data " DCC Introduction
OR2013 workshop "Institutional Repositories Dealing with Data " DCC IntroductionOR2013 workshop "Institutional Repositories Dealing with Data " DCC Introduction
OR2013 workshop "Institutional Repositories Dealing with Data " DCC Introduction
 
How will repository and subject librarians roles interact to support data man...
How will repository and subject librarians roles interact to support data man...How will repository and subject librarians roles interact to support data man...
How will repository and subject librarians roles interact to support data man...
 
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
 
Introduction to Research Data Management
Introduction to Research Data ManagementIntroduction to Research Data Management
Introduction to Research Data Management
 
Reasons to select research data and where to start
Reasons to select research data and where to startReasons to select research data and where to start
Reasons to select research data and where to start
 

Data Selection & Triage

  • 1. Data Selection & Triage JISC/DCC Progress Workshop Managing Research Data & Institutional Engagement Nottingham 25 October 2012 This work is licensed under a Creative Commons Attribution 2.5 UK: Scotland License
  • 2. Introduction How can researchers and support staff effectively decide what data is worth holding on to, agree what to do with it, and arrange for its handover? What challenges does this represent How to address them?
  • 3. Outline • What guidelines are there and why do we need more?Angus Whyte DCC and Marie Therese Gramstadt - KAPTUR • UK Data Archive's Data Review Process - Veerle van Eynden UKDA • Applying NERC's Data Value Checklist - Sam Pepler, British Atmospheric Data Centre • Discussion
  • 4. Guidelines clarify expectations …adapted by Archaeology Data Service NERC KAPTUR University of Leicester What criteria will be used to judge what’s handed over?
  • 5. Basic model 1. Define a policy i.e. criteria and range of decisions All 2. Archive manager applies data criteria, involving researchers 3. Select the significant, dispose of the rest 10 % For records records yes, but researchdata? 90%
  • 6. Characterising research data… • Research process more uncertain and open-ended than admin processes • Research data purpose may change before complete • More effort to make reusable - complex inter- relationships, and richer contexts to document • Originators should be engaged but may not have capacity e.g. if project funding has ceased • Others may need to be involved with broader view of potential in other disciplines • More than keep/dispose choice –need to prioritise attention and effort to make data fit for reuse
  • 7. Triage analogy First Deposit location characterise research data Institutional Data Prioritise Repository Criteria High reuse value + Data Centre needs attention Duty of care affordable Subject Repository etc. Reuse value Other permutations Tiered approach to Quality and deploying resources More permutations condition Discoverability Accessibility Low reuse value, Unaffordable Access management Costs associated Storage performance Potential to automate ? Preservation actions
  • 8. Clarify expectations What kinds of “data” are wanted For what kinds of reuse
  • 9. e.g.Data Centre Collection Policies “The ADS expects to collect all of the following archaeological data types…” http://archaeologydataservice.ac.uk/advice/collectionsPolicy 9
  • 10. Costs should persuade us IDC Digital Universe Study- Increasing volumes outpace declining storage hardware costs According to: John Gantz and David Reinsel 2011 Extracting Value from Chaos http://www.emc.com/digital_universe. 10
  • 11. We can’t afford it all “Keeping 2018’s data in S3 would cost the entire global GDP” http://blog.dshr.org/2012/05/lets-just-keep-everything-forever-in.html 11
  • 12. Selection presumes description • You can’t value what you don’t know about! • Researchers can’t afford NOT to spend effort on minimal metadata description and organisation, because costs of retention will be much higher if they don’t • Description makes data affordable – is citation potential a concrete enough reward? 12
  • 13. Challenges • Identify what datasets are created and where they are • Differentiate those that are of high value from those where most uncertainty or least reusability • Be able to justify ‘natural’ wastage of low priority data as much as deliberate selection of high value
  • 14. Questions • What has worked/is working • What lessons have you learned and how generalisable • What challenges remain • How may they be approached and what do you intend to do • What DCC / MRD activity do you think may help make the challenge more tractable.