SlideShare a Scribd company logo
a centre of expertise in data curation and preservation




             Create, curate, re-use:
the expanding life course of digital research data


                 Chris Rusbridge
           EDUCAUSE Australasia May 2007
                                                                                                         Funded by:
     This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5
     UK: Scotland License, excluding content property of others. To view a copy of this license, visit
     http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative
     Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.
a centre of expertise in data curation and preservation




                    Contents
     • Science and digital curation
     • Why are data important?
     • What kinds of data?
     • What to do with your data: frontiers of
       practice
     • Repository frontiers
     • Changing practice



EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




  Digital Curation Centre Mission
       “The over-riding purpose of the DCC is to
       support and promote continuing improvement
       in the quality of data curation, and of
       associated digital preservation”




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




                Summarising…
     • Sustainability                  • Maintaining meaning
     • Creation or selection             over time
     • Growth, development             • Preserving, including
     • Making available                  past states
     • Access management               • De-selection…
     • Re-usability                    • Extended time
     • Linkage, context,               • Budget and policy
       metadata                          impacts
     • Authenticity, integrity,
                                       • People issues!
       provenance


EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




           Science and curation
     • Creating and managing data suitable for re-use
     • Good curation supports good science (managing
       your data properly)
        • Poor curation allows sloppy science?
     • Data curation should save money
        • Murray-Rust/Frey on interesting but fruitless experiments!
     • Some science impossible without curation…
        • QCD strong coupling constant prediction (Bethke)
        • Viscosity of earth mantle from Shang Dynasty eclipse
          records (Pang et al)
        • Science depending on past baselines (eg environmental,
          social sciences)

EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




            Records of science
     • Data increasingly important as evidence
        • Key part of the scholarly record (public good)
           • Unrepeatable observations & experiments
        • Experimental verifiability (the basis of science)
           • Would Chang retractions have been reduced if his first
             data were available?
        • Allows additional interpretations
        • Legal and compliance
           • See APSR/AERES report for good examples



EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




           What kinds of data?
     • Observations
        • eg UARS (Upper Atmosphere) Level 0: telemetry
        • UARS Level 1: measured physical parameters (post
          calibration?)
     • Derived data
        • UARS Level 2: calculated geophysical? profiles
        • UARS level 3: gridded, interpolated?
     • Combined data
     • Crafted data
        • Eg annotated gene/protein databases
     • Descriptive (meta)data


EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




Retaining research data means…
     • Data secure against loss (within group)
     • Communal repository (secure bit dump)
     • Re-usable, sharable information
     • As above, plus active curation (eg bio-
       informatics)
     • Long term preservation of information

     • Be clear what you are trying to do!

EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




     … or the data trajectory is…
     • Hard drive → lost (crash)
     • Hard drive →DVD →Cardboard box →Loft
       →Skip/dumpster → lost

     • Sometimes this is a very bad thing
     • Sometimes these are the right options!




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




        Long term bit storage…
     • A solved problem? Just requires well-
       understood good data management
       practices?
     • Wrong! For very large datasets over very long
       time, there are significant problems…




                 BAKER, M., SHAH, M., ROSENTHAL, D. S. H., ROUSSOPOLOUS, M., MANIATIS, P., GIULI, T.
                 J. & BUNGALE, P. (2006) A Fresh Look at the Reliability of Long-term Digital Storage. EuroSys
                 '06. Leuven, Belgium, ACM.

EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




   How Well Must We Preserve?
   Keep a petabyte for a century
   – With   50% chance of remaining completely undamaged

   Consider each bit decaying independently
   – Analogy   with radioactive decay

   That's a bit half- life of 10**18 years
   – One    hundred million times the age of the universe

   That's a very demanding requirement
   – Hard to measure
   – Even very unlikely faults will matter a lot

EDUCAUSE Australasia 2007 •Slide from David Rosenthal, LOCKSS
a centre of expertise in data curation and preservation




       What to do about curation
     • Build curation/reusability into your workflow
        • Curation begins before creation
        • What’s easy at first becomes (impossibly) hard
          later
        • Describe your data (metadata schemas,
          “representation info”, etc)
        • Keep experimental parameters (technical, who,
          what, when, where)
        • Keep ability to process
        • Keep data!


EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




    What to do about curation - 2
     • Use standard/agreed formats for data
     • Make ownership & restrictions clear, &
       explain how to cite your data
     • Offer for deposit in institutional or discipline
       repository
        • Appraisal and selection essential
        • Possible time-limited embargos
     • “Publish” data in support of articles



EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation



 Internet Archaeology: publication with
                 data




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




           Database as book…
     • Buneman (early pilot)
       work on IUPHAR
       database
     • MySQL to XML
       database
        • Historic to logical
          schema
     • XML via XSLT to LaTeX




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




                The StORe vision
     • Seamless transport                                 Source
       from research data to
       research publications
       and vice versa                                      ware
     • Bi-directional links                               Middle
       proven in social science
       e-research but capable
       of export to other
       disciplines
                                                          Output




                 •http://jiscstore.jot.com/WikiHome/
EDUCAUSE Australasia 2007                         •Slide from Graham Pryor
a centre of expertise in data curation and preservation




 What are the reusability issues?
     • Data not neutral to hypothesis
     • Hard to know the risks & pitfalls of a particular
       dataset
     • Data not self-describing: hard to find
       appropriate data (but see Murray-Rust on
       Googling InChi etc)
     • Hard to “understand” data once found
        • Really need information, not data!
     • Hard to use data once understood

EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




                     Context
     • Data meaningless without context
        • Metadata of many kinds
        • Representation information… from data to
          information
        • Linkage and connection between datasets
        • Use your workflow!
     • Provenance
        • Authenticity/integrity
        • Computational lineage



EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation



                                              NASA



 Csat8-day composite and subsceneCsat 8-day composite subscene PAR subscene RPT
                                  E0SST            and    Pbopt calc      H
                                                                       Ctot calc Zeu calc PPeu calc




                                           University                                                                           research
                                                                                                         University              group3       local
                                           research
                                                                                                         research                           decision-
                                            group1
                                                                                                          group2                           making body



EDUCAUSE Australasia 2007                                                                                                Slide from Rajendra Bose
a centre of expertise in data curation and preservation




            Access and re-use
     • Ethics and rights control access
        • Weak in expressing this long-term
     • Collaboration tools
        • Annotation, discussion, review (see DART…)
        • Re-use leading to change and development
     • “Publication”
        • Not just in “print”
        • Underlying data should be “published”, too


EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




      Database citation issues…
     • Citation for human readers and machine use cases
     • Granularity: database, record, item
     • Citation of changing objects
        • Version change (eg W3C practice: no version = latest, vs bibliographic:
          no version = first)
        • An efficient way to reference and access “archived” past states of
          more rapidly changing dataset, eg Genomics… datasets that result
          from the combined work of curators, or contain opinions or facts likely
          to change (work in progress, Buneman et al)
     • Standards conflict and immature (NLM best?)

     • Citation ESSENTIAL for motivating quality academic work on data
       management and curation


EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




            Who does curation?
     •   Individuals
     •   Departments or groups
     •   Institutions, maybe through libraries
     •   Communities
     •   Disciplines
     •   Publishers
     •   National services
     •   Other 3rd parties…

EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




            Curation: Individual
     • “Small science 2-3 times more data than Big
       science”, but much more at risk
     • PhD student? RA? PI? Administrator? IT support?
     • Data potentially on local hard drives, or at best
       shared network drives
        • May be inadequately protected
        • Liable for policy-led deletion on resignation
     • Individual “knows” too much (tacit knowledge)
        • Documentation/metadata unlikely to be adequate
     • Future: gone!

EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




            Curation: Individual




EDUCAUSE Australasia 2007                                          •© Marita Bushell
a centre of expertise in data curation and preservation




         Department: eCrystals
                                       •   Partnership with Institutional
                                           Repository
                                       •   Specialist department
                                           archive (& national service)
                                       •   Workflow recording of lab
                                           parameters (R4L)
                                       •   Public & private elements
                                       •   Trying to build eCrystals
                                           federation (eBank 3)
                                       •   Future: likely to continue




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




  Data in institutional repositories




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




    Institution: Cambridge Chemistry
                                         •   175,000 small molecule
                                             structures in CML
                                         •   Alongside Archaeology,
                                             Manuscripts, Learning
                                             Materials, etc
                                         •   No library curation skills;
                                             dependent on research
                                             group enthusiast
                                         •   Collection isolated from
                                             other Chemistry
                                         •   (Only 5 UK institutional
                                             repositories claim to hold
                                             data)
                                         •   Future: assured…

EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




         Community: LOCKSS?
                                         •   Self-selected group of
                                             collectors: closest to genuine
                                             open activity (despite
                                             Alliance)?
                                         •   Traditionally libraries
                                             collecting eJournals
                                         •   Model respects IPR
                                         •   No domain expertise; rely on
                                             origins
                                         •   Data limitations…
                                         •   Future: potentially very
                                             persistent (low cost, high
                                             reliability, attack resistance,
                                             distributed)

EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




 Discipline: Atmospheric Science
                                         • Strong believer in need
                                           for domain scientists as
                                           curators
                                         • Significant participant in
                                           “community proxy”
                                           agenda-setting activities
                                         • Internationally
                                           fragmented resources
                                         • Future: mostly
                                           dependent on grant
                                           funding (but strong
                                           commitment)

EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




       Discipline: Pharmacology
                                         • International Scientific
                                           Union
                                         • Attempting to build
                                           credit for data
                                           contributions
                                         • Future: extremely
                                           limited funding




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




  Bio-informatics: Nature article
            23 June 05
    • Databases in Peril
        • 51 out of 89 biological databases contacted reported they
          were struggling financially
        • 7 have closed
        • Several being updated in owner’s spare time
        • (Notes that not all deserve long term support)

    • [Nucleic Acids Research reports 968 databases in
      2007!]
    • Major issue: money


EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation



      Publisher: Crystallography
                                         •   Publisher and Scientific
                                             Union
                                         •   Created key domain
                                             crystallographic standard
                                             (CIF)
                                         •   Strong motivator for deposit
                                             of structure data
                                         •   Consistent quality checks
                                         •   DOIs used for structure data
                                         •   Future: publishing business
                                             model



EDUCAUSE Australasia 2007                                 •Slide from IUCr
a centre of expertise in data curation and preservation




   National bodies: British Library
                                         • Serious and robust
                                           approach
                                         • Legal deposit powers &
                                           responsibilities as driver
                                         • Oriented primarily
                                           towards “cultural
                                           heritage” (broadly
                                           interpreted)
                                         • Little data, no science
                                           domain experience
                                         • Future: strong future
                                           commitment

EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




    National bodies: TNA/NDAD
                                         •   Specialist archive for
                                             government datasets
                                         •   Understand government
                                             regulations, dynamics &
                                             requirements
                                         •   Subject generalists;
                                             disconnected from
                                             associated science
                                         •   Technology specialists
                                             (understand databases)
                                         •   Future: likely to pass
                                             eventually to The National
                                             Archives



EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




            3rd parties: Portico
                                         • Specific area: eJournals
                                         • Depends on publisher
                                           agreements
                                         • No data or domain
                                           science expertise
                                         • Future: commitment
                                           from Mellon +
                                           publishers +
                                           subscriptions, good
                                           funding mix


EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




     3rd Parties: Iron Mountain?
                                         • Records management
                                           IS a curation problem
                                         • Organisations like this
                                           very likely to branch out
                                         • No domain science
                                           expertise
                                         • Future: business case,
                                           viability, stock market…




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




      3rd parties: Web 2.0 style,
            Swivel.com??




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




       Institutions & the network
     • Institutions have                              Inst’   Inst’n   Inst’n
                                                       n1       2        3
       fundamental
       sustainability               Discipline 1       X                  X
     • Disciplines have domain
       knowledge advantage          Discipline 2                X         X
       but sustainability is an
       issue
                                    Discipline 3       X        X
     • Can we get the best of
       both?
     • Needs serious work to             etc

       examine!


EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




   Who are the curation players?




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




               Cultural change
     • If we build it, will they come? NO!!
     • Outreach important: communication with
       scientists and researchers is hard graft
     • Cultural change to new approach requires more:
        • Incentives, rewards and mandates
        • Successful exemplars (well publicised)
        • Discipline-oriented approach (one size does not fit all)




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




            Australian context?
     • In the emerging context of the Research
       Quality Framework, and the expected
       National Collaborative Research
       Infrastructure Strategy, curation can only
       increase in importance!




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




                            Thank you




                               •(Citations in paper in proceedings)
EDUCAUSE Australasia 2007

More Related Content

What's hot

Introduction to Research Data Management at UWA
Introduction to Research Data Management at UWAIntroduction to Research Data Management at UWA
Introduction to Research Data Management at UWA
Katina Toufexis
 
Research Data Management Services at UWA (November 2015)
Research Data Management Services at UWA (November 2015)Research Data Management Services at UWA (November 2015)
Research Data Management Services at UWA (November 2015)
Katina Toufexis
 
UWA Research Week 2016
UWA Research Week 2016UWA Research Week 2016
UWA Research Week 2016
Katina Toufexis
 
Research Data in the Arts and Humanities: A Few Difficulties
Research Data in the Arts and Humanities: A Few DifficultiesResearch Data in the Arts and Humanities: A Few Difficulties
Research Data in the Arts and Humanities: A Few Difficulties
Martin Donnelly
 
Disciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curationDisciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curation
Michael Day
 
Neil Rambo "Understanding E-Science: A Symposium for Medical Librarians"
Neil Rambo "Understanding E-Science: A Symposium for Medical Librarians"Neil Rambo "Understanding E-Science: A Symposium for Medical Librarians"
Neil Rambo "Understanding E-Science: A Symposium for Medical Librarians"
The TMC Library
 
data curation issues
data curation issuesdata curation issues
data curation issues
Michelle Hudson
 

What's hot (7)

Introduction to Research Data Management at UWA
Introduction to Research Data Management at UWAIntroduction to Research Data Management at UWA
Introduction to Research Data Management at UWA
 
Research Data Management Services at UWA (November 2015)
Research Data Management Services at UWA (November 2015)Research Data Management Services at UWA (November 2015)
Research Data Management Services at UWA (November 2015)
 
UWA Research Week 2016
UWA Research Week 2016UWA Research Week 2016
UWA Research Week 2016
 
Research Data in the Arts and Humanities: A Few Difficulties
Research Data in the Arts and Humanities: A Few DifficultiesResearch Data in the Arts and Humanities: A Few Difficulties
Research Data in the Arts and Humanities: A Few Difficulties
 
Disciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curationDisciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curation
 
Neil Rambo "Understanding E-Science: A Symposium for Medical Librarians"
Neil Rambo "Understanding E-Science: A Symposium for Medical Librarians"Neil Rambo "Understanding E-Science: A Symposium for Medical Librarians"
Neil Rambo "Understanding E-Science: A Symposium for Medical Librarians"
 
data curation issues
data curation issuesdata curation issues
data curation issues
 

Viewers also liked

Blue Ribbon Task Force on Sustainable Digital Preservation
Blue Ribbon Task Force on Sustainable Digital PreservationBlue Ribbon Task Force on Sustainable Digital Preservation
Blue Ribbon Task Force on Sustainable Digital Preservation
Chris Rusbridge
 
Disciplinary dimensions of digital curation: introduction and synthesis
Disciplinary dimensions of digital curation: introduction and synthesisDisciplinary dimensions of digital curation: introduction and synthesis
Disciplinary dimensions of digital curation: introduction and synthesis
Chris Rusbridge
 
Pimp Je Bib
Pimp Je BibPimp Je Bib
LOCKSS UK, with a focus on reporting experience
LOCKSS UK, with a focus on reporting experienceLOCKSS UK, with a focus on reporting experience
LOCKSS UK, with a focus on reporting experienceChris Rusbridge
 
Under 4 Event Presentation
Under 4 Event PresentationUnder 4 Event Presentation
Under 4 Event Presentationbigslides
 
JISC Digital Library initiatives
JISC Digital Library initiativesJISC Digital Library initiatives
JISC Digital Library initiatives
Chris Rusbridge
 
20100122_Ep_Ehost_2_0_Ehis
20100122_Ep_Ehost_2_0_Ehis20100122_Ep_Ehost_2_0_Ehis

Viewers also liked (8)

Blue Ribbon Task Force on Sustainable Digital Preservation
Blue Ribbon Task Force on Sustainable Digital PreservationBlue Ribbon Task Force on Sustainable Digital Preservation
Blue Ribbon Task Force on Sustainable Digital Preservation
 
Disciplinary dimensions of digital curation: introduction and synthesis
Disciplinary dimensions of digital curation: introduction and synthesisDisciplinary dimensions of digital curation: introduction and synthesis
Disciplinary dimensions of digital curation: introduction and synthesis
 
Bollansee Jan
Bollansee JanBollansee Jan
Bollansee Jan
 
Pimp Je Bib
Pimp Je BibPimp Je Bib
Pimp Je Bib
 
LOCKSS UK, with a focus on reporting experience
LOCKSS UK, with a focus on reporting experienceLOCKSS UK, with a focus on reporting experience
LOCKSS UK, with a focus on reporting experience
 
Under 4 Event Presentation
Under 4 Event PresentationUnder 4 Event Presentation
Under 4 Event Presentation
 
JISC Digital Library initiatives
JISC Digital Library initiativesJISC Digital Library initiatives
JISC Digital Library initiatives
 
20100122_Ep_Ehost_2_0_Ehis
20100122_Ep_Ehost_2_0_Ehis20100122_Ep_Ehost_2_0_Ehis
20100122_Ep_Ehost_2_0_Ehis
 

Similar to Create, curate, re-use: the expanding life course of digital research data

Saving private data, sharing Open Data? Role of libraries and institutional r...
Saving private data, sharing Open Data? Role of libraries and institutional r...Saving private data, sharing Open Data? Role of libraries and institutional r...
Saving private data, sharing Open Data? Role of libraries and institutional r...
Chris Rusbridge
 
ANDS and Data Management
ANDS and Data ManagementANDS and Data Management
ANDS and Data Management
Julia Gross
 
Scally The Library's Role in Research Data Management. OCLC partnership meeti...
Scally The Library's Role in Research Data Management. OCLC partnership meeti...Scally The Library's Role in Research Data Management. OCLC partnership meeti...
Scally The Library's Role in Research Data Management. OCLC partnership meeti...
John Scally
 
RDAP13 John Kunze: The Data Management Ecosystem
RDAP13 John Kunze: The Data Management EcosystemRDAP13 John Kunze: The Data Management Ecosystem
RDAP13 John Kunze: The Data Management Ecosystem
ASIS&T
 
The Data Management Ecosystem
The Data Management EcosystemThe Data Management Ecosystem
The Data Management EcosystemJohn Kunze
 
Ausplots Training - Session 1
Ausplots Training - Session 1Ausplots Training - Session 1
Ausplots Training - Session 1
bensparrowau
 
Research Data Management Services at UWA (July 2015)
Research Data Management Services at UWA (July 2015)Research Data Management Services at UWA (July 2015)
Research Data Management Services at UWA (July 2015)
Katina Toufexis
 
Data Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach DataData Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach Data
cunera
 
ESI Supplemental 1 E-research Support Slides
ESI Supplemental 1   E-research Support SlidesESI Supplemental 1   E-research Support Slides
ESI Supplemental 1 E-research Support Slides
DuraSpace
 
Improving RDM through closer integration of electronic lab notebooks and data...
Improving RDM through closer integration of electronic lab notebooks and data...Improving RDM through closer integration of electronic lab notebooks and data...
Improving RDM through closer integration of electronic lab notebooks and data...
rmacneil88
 
Presentation to the UM Library Emergent Research Series
Presentation to the UM Library Emergent Research SeriesPresentation to the UM Library Emergent Research Series
Presentation to the UM Library Emergent Research Series
SEAD
 
Claudia Bauzer Medeiros Digital preservation – caring for our data to foster...
Claudia Bauzer Medeiros  Digital preservation – caring for our data to foster...Claudia Bauzer Medeiros  Digital preservation – caring for our data to foster...
Claudia Bauzer Medeiros Digital preservation – caring for our data to foster...Beniamino Murgante
 
Introducingthe anu datacommons
Introducingthe anu datacommonsIntroducingthe anu datacommons
Introducingthe anu datacommons
Doug Moncur
 
The future of the DCC
The future of the DCCThe future of the DCC
The future of the DCC
Chris Rusbridge
 
'Data Management Planning: the role of institutions and researchers' eResearc...
'Data Management Planning: the role of institutions and researchers' eResearc...'Data Management Planning: the role of institutions and researchers' eResearc...
'Data Management Planning: the role of institutions and researchers' eResearc...
Marta Ribeiro
 
Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...
Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...
Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...TERN Australia
 
Guy avoiding-dat apocalypse
Guy avoiding-dat apocalypseGuy avoiding-dat apocalypse
Guy avoiding-dat apocalypse
ENUG
 
RDM Programme @ Edinburgh: Data Librarian Experience
RDM Programme @ Edinburgh: Data Librarian ExperienceRDM Programme @ Edinburgh: Data Librarian Experience
RDM Programme @ Edinburgh: Data Librarian Experience
EDINA, University of Edinburgh
 
User engagement in research data curation
User engagement in research data curationUser engagement in research data curation
User engagement in research data curation
EDINA, University of Edinburgh
 

Similar to Create, curate, re-use: the expanding life course of digital research data (20)

Saving private data, sharing Open Data? Role of libraries and institutional r...
Saving private data, sharing Open Data? Role of libraries and institutional r...Saving private data, sharing Open Data? Role of libraries and institutional r...
Saving private data, sharing Open Data? Role of libraries and institutional r...
 
ANDS and Data Management
ANDS and Data ManagementANDS and Data Management
ANDS and Data Management
 
Scally The Library's Role in Research Data Management. OCLC partnership meeti...
Scally The Library's Role in Research Data Management. OCLC partnership meeti...Scally The Library's Role in Research Data Management. OCLC partnership meeti...
Scally The Library's Role in Research Data Management. OCLC partnership meeti...
 
RDAP13 John Kunze: The Data Management Ecosystem
RDAP13 John Kunze: The Data Management EcosystemRDAP13 John Kunze: The Data Management Ecosystem
RDAP13 John Kunze: The Data Management Ecosystem
 
The Data Management Ecosystem
The Data Management EcosystemThe Data Management Ecosystem
The Data Management Ecosystem
 
Ausplots Training - Session 1
Ausplots Training - Session 1Ausplots Training - Session 1
Ausplots Training - Session 1
 
Research Data Management Services at UWA (July 2015)
Research Data Management Services at UWA (July 2015)Research Data Management Services at UWA (July 2015)
Research Data Management Services at UWA (July 2015)
 
Data Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach DataData Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach Data
 
ESI Supplemental 1 E-research Support Slides
ESI Supplemental 1   E-research Support SlidesESI Supplemental 1   E-research Support Slides
ESI Supplemental 1 E-research Support Slides
 
Improving RDM through closer integration of electronic lab notebooks and data...
Improving RDM through closer integration of electronic lab notebooks and data...Improving RDM through closer integration of electronic lab notebooks and data...
Improving RDM through closer integration of electronic lab notebooks and data...
 
Presentation to the UM Library Emergent Research Series
Presentation to the UM Library Emergent Research SeriesPresentation to the UM Library Emergent Research Series
Presentation to the UM Library Emergent Research Series
 
Claudia Bauzer Medeiros Digital preservation – caring for our data to foster...
Claudia Bauzer Medeiros  Digital preservation – caring for our data to foster...Claudia Bauzer Medeiros  Digital preservation – caring for our data to foster...
Claudia Bauzer Medeiros Digital preservation – caring for our data to foster...
 
Introducingthe anu datacommons
Introducingthe anu datacommonsIntroducingthe anu datacommons
Introducingthe anu datacommons
 
The future of the DCC
The future of the DCCThe future of the DCC
The future of the DCC
 
'Data Management Planning: the role of institutions and researchers' eResearc...
'Data Management Planning: the role of institutions and researchers' eResearc...'Data Management Planning: the role of institutions and researchers' eResearc...
'Data Management Planning: the role of institutions and researchers' eResearc...
 
Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...
Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...
Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...
 
Guy avoiding-dat apocalypse
Guy avoiding-dat apocalypseGuy avoiding-dat apocalypse
Guy avoiding-dat apocalypse
 
RDM Programme @ Edinburgh: Data Librarian Experience
RDM Programme @ Edinburgh: Data Librarian ExperienceRDM Programme @ Edinburgh: Data Librarian Experience
RDM Programme @ Edinburgh: Data Librarian Experience
 
User engagement in research data curation
User engagement in research data curationUser engagement in research data curation
User engagement in research data curation
 
METRO RDM Webinar
METRO RDM WebinarMETRO RDM Webinar
METRO RDM Webinar
 

More from Chris Rusbridge

The Distributed National Electronic Resource and the Electronic Libraries Pro...
The Distributed National Electronic Resource and the Electronic Libraries Pro...The Distributed National Electronic Resource and the Electronic Libraries Pro...
The Distributed National Electronic Resource and the Electronic Libraries Pro...
Chris Rusbridge
 
Practical steps towards digital preservation at institutional levels
Practical steps towards digital preservation at institutional levelsPractical steps towards digital preservation at institutional levels
Practical steps towards digital preservation at institutional levels
Chris Rusbridge
 
The Licence Trap
The Licence TrapThe Licence Trap
The Licence Trap
Chris Rusbridge
 
Cautious Optimism: Cultivate your Garden
Cautious Optimism: Cultivate your GardenCautious Optimism: Cultivate your Garden
Cautious Optimism: Cultivate your Garden
Chris Rusbridge
 
Frequently-asked questions on Freedom of Information and Environmental Inform...
Frequently-asked questions on Freedom of Information and Environmental Inform...Frequently-asked questions on Freedom of Information and Environmental Inform...
Frequently-asked questions on Freedom of Information and Environmental Inform...
Chris Rusbridge
 
Issues in long-term knowledge retention in engineering
Issues in long-term knowledge retention in engineeringIssues in long-term knowledge retention in engineering
Issues in long-term knowledge retention in engineering
Chris Rusbridge
 
"Tomorrow, and tomorrow, and tomorrow": the players on the curation stage
"Tomorrow, and tomorrow, and tomorrow": the players on the curation stage"Tomorrow, and tomorrow, and tomorrow": the players on the curation stage
"Tomorrow, and tomorrow, and tomorrow": the players on the curation stage
Chris Rusbridge
 
Moving the repository upstream
Moving the repository upstreamMoving the repository upstream
Moving the repository upstream
Chris Rusbridge
 
Trust and repository audit: can repository managers assure trustworthiness?
Trust and repository audit: can repository managers assure trustworthiness?Trust and repository audit: can repository managers assure trustworthiness?
Trust and repository audit: can repository managers assure trustworthiness?
Chris Rusbridge
 
Reference Model for Economically Sustainable Digital Curation
Reference Model for Economically Sustainable Digital CurationReference Model for Economically Sustainable Digital Curation
Reference Model for Economically Sustainable Digital Curation
Chris Rusbridge
 
Frequently-asked questions on Freedom of Information and Environmental Inform...
Frequently-asked questions on Freedom of Information and Environmental Inform...Frequently-asked questions on Freedom of Information and Environmental Inform...
Frequently-asked questions on Freedom of Information and Environmental Inform...Chris Rusbridge
 
Sustainable Digital Preservation and Access
Sustainable Digital Preservation and AccessSustainable Digital Preservation and Access
Sustainable Digital Preservation and Access
Chris Rusbridge
 
Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositories
Chris Rusbridge
 

More from Chris Rusbridge (15)

The Distributed National Electronic Resource and the Electronic Libraries Pro...
The Distributed National Electronic Resource and the Electronic Libraries Pro...The Distributed National Electronic Resource and the Electronic Libraries Pro...
The Distributed National Electronic Resource and the Electronic Libraries Pro...
 
Practical steps towards digital preservation at institutional levels
Practical steps towards digital preservation at institutional levelsPractical steps towards digital preservation at institutional levels
Practical steps towards digital preservation at institutional levels
 
The Licence Trap
The Licence TrapThe Licence Trap
The Licence Trap
 
Cautious Optimism: Cultivate your Garden
Cautious Optimism: Cultivate your GardenCautious Optimism: Cultivate your Garden
Cautious Optimism: Cultivate your Garden
 
Frequently-asked questions on Freedom of Information and Environmental Inform...
Frequently-asked questions on Freedom of Information and Environmental Inform...Frequently-asked questions on Freedom of Information and Environmental Inform...
Frequently-asked questions on Freedom of Information and Environmental Inform...
 
Dcc endeavour-2006
Dcc endeavour-2006Dcc endeavour-2006
Dcc endeavour-2006
 
Issues in long-term knowledge retention in engineering
Issues in long-term knowledge retention in engineeringIssues in long-term knowledge retention in engineering
Issues in long-term knowledge retention in engineering
 
"Tomorrow, and tomorrow, and tomorrow": the players on the curation stage
"Tomorrow, and tomorrow, and tomorrow": the players on the curation stage"Tomorrow, and tomorrow, and tomorrow": the players on the curation stage
"Tomorrow, and tomorrow, and tomorrow": the players on the curation stage
 
Moving the repository upstream
Moving the repository upstreamMoving the repository upstream
Moving the repository upstream
 
Dcc jsr phase 3
Dcc jsr phase 3Dcc jsr phase 3
Dcc jsr phase 3
 
Trust and repository audit: can repository managers assure trustworthiness?
Trust and repository audit: can repository managers assure trustworthiness?Trust and repository audit: can repository managers assure trustworthiness?
Trust and repository audit: can repository managers assure trustworthiness?
 
Reference Model for Economically Sustainable Digital Curation
Reference Model for Economically Sustainable Digital CurationReference Model for Economically Sustainable Digital Curation
Reference Model for Economically Sustainable Digital Curation
 
Frequently-asked questions on Freedom of Information and Environmental Inform...
Frequently-asked questions on Freedom of Information and Environmental Inform...Frequently-asked questions on Freedom of Information and Environmental Inform...
Frequently-asked questions on Freedom of Information and Environmental Inform...
 
Sustainable Digital Preservation and Access
Sustainable Digital Preservation and AccessSustainable Digital Preservation and Access
Sustainable Digital Preservation and Access
 
Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositories
 

Recently uploaded

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 

Recently uploaded (20)

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 

Create, curate, re-use: the expanding life course of digital research data

  • 1. a centre of expertise in data curation and preservation Create, curate, re-use: the expanding life course of digital research data Chris Rusbridge EDUCAUSE Australasia May 2007 Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License, excluding content property of others. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.
  • 2. a centre of expertise in data curation and preservation Contents • Science and digital curation • Why are data important? • What kinds of data? • What to do with your data: frontiers of practice • Repository frontiers • Changing practice EDUCAUSE Australasia 2007
  • 3. a centre of expertise in data curation and preservation Digital Curation Centre Mission “The over-riding purpose of the DCC is to support and promote continuing improvement in the quality of data curation, and of associated digital preservation” EDUCAUSE Australasia 2007
  • 4. a centre of expertise in data curation and preservation EDUCAUSE Australasia 2007
  • 5. a centre of expertise in data curation and preservation Summarising… • Sustainability • Maintaining meaning • Creation or selection over time • Growth, development • Preserving, including • Making available past states • Access management • De-selection… • Re-usability • Extended time • Linkage, context, • Budget and policy metadata impacts • Authenticity, integrity, • People issues! provenance EDUCAUSE Australasia 2007
  • 6. a centre of expertise in data curation and preservation Science and curation • Creating and managing data suitable for re-use • Good curation supports good science (managing your data properly) • Poor curation allows sloppy science? • Data curation should save money • Murray-Rust/Frey on interesting but fruitless experiments! • Some science impossible without curation… • QCD strong coupling constant prediction (Bethke) • Viscosity of earth mantle from Shang Dynasty eclipse records (Pang et al) • Science depending on past baselines (eg environmental, social sciences) EDUCAUSE Australasia 2007
  • 7. a centre of expertise in data curation and preservation Records of science • Data increasingly important as evidence • Key part of the scholarly record (public good) • Unrepeatable observations & experiments • Experimental verifiability (the basis of science) • Would Chang retractions have been reduced if his first data were available? • Allows additional interpretations • Legal and compliance • See APSR/AERES report for good examples EDUCAUSE Australasia 2007
  • 8. a centre of expertise in data curation and preservation What kinds of data? • Observations • eg UARS (Upper Atmosphere) Level 0: telemetry • UARS Level 1: measured physical parameters (post calibration?) • Derived data • UARS Level 2: calculated geophysical? profiles • UARS level 3: gridded, interpolated? • Combined data • Crafted data • Eg annotated gene/protein databases • Descriptive (meta)data EDUCAUSE Australasia 2007
  • 9. a centre of expertise in data curation and preservation Retaining research data means… • Data secure against loss (within group) • Communal repository (secure bit dump) • Re-usable, sharable information • As above, plus active curation (eg bio- informatics) • Long term preservation of information • Be clear what you are trying to do! EDUCAUSE Australasia 2007
  • 10. a centre of expertise in data curation and preservation … or the data trajectory is… • Hard drive → lost (crash) • Hard drive →DVD →Cardboard box →Loft →Skip/dumpster → lost • Sometimes this is a very bad thing • Sometimes these are the right options! EDUCAUSE Australasia 2007
  • 11. a centre of expertise in data curation and preservation Long term bit storage… • A solved problem? Just requires well- understood good data management practices? • Wrong! For very large datasets over very long time, there are significant problems… BAKER, M., SHAH, M., ROSENTHAL, D. S. H., ROUSSOPOLOUS, M., MANIATIS, P., GIULI, T. J. & BUNGALE, P. (2006) A Fresh Look at the Reliability of Long-term Digital Storage. EuroSys '06. Leuven, Belgium, ACM. EDUCAUSE Australasia 2007
  • 12. a centre of expertise in data curation and preservation How Well Must We Preserve? Keep a petabyte for a century – With 50% chance of remaining completely undamaged Consider each bit decaying independently – Analogy with radioactive decay That's a bit half- life of 10**18 years – One hundred million times the age of the universe That's a very demanding requirement – Hard to measure – Even very unlikely faults will matter a lot EDUCAUSE Australasia 2007 •Slide from David Rosenthal, LOCKSS
  • 13. a centre of expertise in data curation and preservation What to do about curation • Build curation/reusability into your workflow • Curation begins before creation • What’s easy at first becomes (impossibly) hard later • Describe your data (metadata schemas, “representation info”, etc) • Keep experimental parameters (technical, who, what, when, where) • Keep ability to process • Keep data! EDUCAUSE Australasia 2007
  • 14. a centre of expertise in data curation and preservation What to do about curation - 2 • Use standard/agreed formats for data • Make ownership & restrictions clear, & explain how to cite your data • Offer for deposit in institutional or discipline repository • Appraisal and selection essential • Possible time-limited embargos • “Publish” data in support of articles EDUCAUSE Australasia 2007
  • 15. a centre of expertise in data curation and preservation Internet Archaeology: publication with data EDUCAUSE Australasia 2007
  • 16. a centre of expertise in data curation and preservation Database as book… • Buneman (early pilot) work on IUPHAR database • MySQL to XML database • Historic to logical schema • XML via XSLT to LaTeX EDUCAUSE Australasia 2007
  • 17. a centre of expertise in data curation and preservation The StORe vision • Seamless transport Source from research data to research publications and vice versa ware • Bi-directional links Middle proven in social science e-research but capable of export to other disciplines Output •http://jiscstore.jot.com/WikiHome/ EDUCAUSE Australasia 2007 •Slide from Graham Pryor
  • 18. a centre of expertise in data curation and preservation What are the reusability issues? • Data not neutral to hypothesis • Hard to know the risks & pitfalls of a particular dataset • Data not self-describing: hard to find appropriate data (but see Murray-Rust on Googling InChi etc) • Hard to “understand” data once found • Really need information, not data! • Hard to use data once understood EDUCAUSE Australasia 2007
  • 19. a centre of expertise in data curation and preservation Context • Data meaningless without context • Metadata of many kinds • Representation information… from data to information • Linkage and connection between datasets • Use your workflow! • Provenance • Authenticity/integrity • Computational lineage EDUCAUSE Australasia 2007
  • 20. a centre of expertise in data curation and preservation NASA Csat8-day composite and subsceneCsat 8-day composite subscene PAR subscene RPT E0SST and Pbopt calc H Ctot calc Zeu calc PPeu calc University research University group3 local research research decision- group1 group2 making body EDUCAUSE Australasia 2007 Slide from Rajendra Bose
  • 21. a centre of expertise in data curation and preservation Access and re-use • Ethics and rights control access • Weak in expressing this long-term • Collaboration tools • Annotation, discussion, review (see DART…) • Re-use leading to change and development • “Publication” • Not just in “print” • Underlying data should be “published”, too EDUCAUSE Australasia 2007
  • 22. a centre of expertise in data curation and preservation Database citation issues… • Citation for human readers and machine use cases • Granularity: database, record, item • Citation of changing objects • Version change (eg W3C practice: no version = latest, vs bibliographic: no version = first) • An efficient way to reference and access “archived” past states of more rapidly changing dataset, eg Genomics… datasets that result from the combined work of curators, or contain opinions or facts likely to change (work in progress, Buneman et al) • Standards conflict and immature (NLM best?) • Citation ESSENTIAL for motivating quality academic work on data management and curation EDUCAUSE Australasia 2007
  • 23. a centre of expertise in data curation and preservation Who does curation? • Individuals • Departments or groups • Institutions, maybe through libraries • Communities • Disciplines • Publishers • National services • Other 3rd parties… EDUCAUSE Australasia 2007
  • 24. a centre of expertise in data curation and preservation Curation: Individual • “Small science 2-3 times more data than Big science”, but much more at risk • PhD student? RA? PI? Administrator? IT support? • Data potentially on local hard drives, or at best shared network drives • May be inadequately protected • Liable for policy-led deletion on resignation • Individual “knows” too much (tacit knowledge) • Documentation/metadata unlikely to be adequate • Future: gone! EDUCAUSE Australasia 2007
  • 25. a centre of expertise in data curation and preservation Curation: Individual EDUCAUSE Australasia 2007 •© Marita Bushell
  • 26. a centre of expertise in data curation and preservation Department: eCrystals • Partnership with Institutional Repository • Specialist department archive (& national service) • Workflow recording of lab parameters (R4L) • Public & private elements • Trying to build eCrystals federation (eBank 3) • Future: likely to continue EDUCAUSE Australasia 2007
  • 27. a centre of expertise in data curation and preservation Data in institutional repositories EDUCAUSE Australasia 2007
  • 28. a centre of expertise in data curation and preservation Institution: Cambridge Chemistry • 175,000 small molecule structures in CML • Alongside Archaeology, Manuscripts, Learning Materials, etc • No library curation skills; dependent on research group enthusiast • Collection isolated from other Chemistry • (Only 5 UK institutional repositories claim to hold data) • Future: assured… EDUCAUSE Australasia 2007
  • 29. a centre of expertise in data curation and preservation Community: LOCKSS? • Self-selected group of collectors: closest to genuine open activity (despite Alliance)? • Traditionally libraries collecting eJournals • Model respects IPR • No domain expertise; rely on origins • Data limitations… • Future: potentially very persistent (low cost, high reliability, attack resistance, distributed) EDUCAUSE Australasia 2007
  • 30. a centre of expertise in data curation and preservation Discipline: Atmospheric Science • Strong believer in need for domain scientists as curators • Significant participant in “community proxy” agenda-setting activities • Internationally fragmented resources • Future: mostly dependent on grant funding (but strong commitment) EDUCAUSE Australasia 2007
  • 31. a centre of expertise in data curation and preservation Discipline: Pharmacology • International Scientific Union • Attempting to build credit for data contributions • Future: extremely limited funding EDUCAUSE Australasia 2007
  • 32. a centre of expertise in data curation and preservation Bio-informatics: Nature article 23 June 05 • Databases in Peril • 51 out of 89 biological databases contacted reported they were struggling financially • 7 have closed • Several being updated in owner’s spare time • (Notes that not all deserve long term support) • [Nucleic Acids Research reports 968 databases in 2007!] • Major issue: money EDUCAUSE Australasia 2007
  • 33. a centre of expertise in data curation and preservation Publisher: Crystallography • Publisher and Scientific Union • Created key domain crystallographic standard (CIF) • Strong motivator for deposit of structure data • Consistent quality checks • DOIs used for structure data • Future: publishing business model EDUCAUSE Australasia 2007 •Slide from IUCr
  • 34. a centre of expertise in data curation and preservation National bodies: British Library • Serious and robust approach • Legal deposit powers & responsibilities as driver • Oriented primarily towards “cultural heritage” (broadly interpreted) • Little data, no science domain experience • Future: strong future commitment EDUCAUSE Australasia 2007
  • 35. a centre of expertise in data curation and preservation National bodies: TNA/NDAD • Specialist archive for government datasets • Understand government regulations, dynamics & requirements • Subject generalists; disconnected from associated science • Technology specialists (understand databases) • Future: likely to pass eventually to The National Archives EDUCAUSE Australasia 2007
  • 36. a centre of expertise in data curation and preservation 3rd parties: Portico • Specific area: eJournals • Depends on publisher agreements • No data or domain science expertise • Future: commitment from Mellon + publishers + subscriptions, good funding mix EDUCAUSE Australasia 2007
  • 37. a centre of expertise in data curation and preservation 3rd Parties: Iron Mountain? • Records management IS a curation problem • Organisations like this very likely to branch out • No domain science expertise • Future: business case, viability, stock market… EDUCAUSE Australasia 2007
  • 38. a centre of expertise in data curation and preservation 3rd parties: Web 2.0 style, Swivel.com?? EDUCAUSE Australasia 2007
  • 39. a centre of expertise in data curation and preservation Institutions & the network • Institutions have Inst’ Inst’n Inst’n n1 2 3 fundamental sustainability Discipline 1 X X • Disciplines have domain knowledge advantage Discipline 2 X X but sustainability is an issue Discipline 3 X X • Can we get the best of both? • Needs serious work to etc examine! EDUCAUSE Australasia 2007
  • 40. a centre of expertise in data curation and preservation Who are the curation players? EDUCAUSE Australasia 2007
  • 41. a centre of expertise in data curation and preservation Cultural change • If we build it, will they come? NO!! • Outreach important: communication with scientists and researchers is hard graft • Cultural change to new approach requires more: • Incentives, rewards and mandates • Successful exemplars (well publicised) • Discipline-oriented approach (one size does not fit all) EDUCAUSE Australasia 2007
  • 42. a centre of expertise in data curation and preservation Australian context? • In the emerging context of the Research Quality Framework, and the expected National Collaborative Research Infrastructure Strategy, curation can only increase in importance! EDUCAUSE Australasia 2007
  • 43. a centre of expertise in data curation and preservation Thank you •(Citations in paper in proceedings) EDUCAUSE Australasia 2007