a centre of expertise in data curation and preservation




             Create, curate, re-use:
the expanding life course of digital research data


                 Chris Rusbridge
           EDUCAUSE Australasia May 2007
                                                                                                         Funded by:
     This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5
     UK: Scotland License, excluding content property of others. To view a copy of this license, visit
     http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative
     Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.
a centre of expertise in data curation and preservation




                    Contents
     • Science and digital curation
     • Why are data important?
     • What kinds of data?
     • What to do with your data: frontiers of
       practice
     • Repository frontiers
     • Changing practice



EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




  Digital Curation Centre Mission
       “The over-riding purpose of the DCC is to
       support and promote continuing improvement
       in the quality of data curation, and of
       associated digital preservation”




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




                Summarising…
     • Sustainability                  • Maintaining meaning
     • Creation or selection             over time
     • Growth, development             • Preserving, including
     • Making available                  past states
     • Access management               • De-selection…
     • Re-usability                    • Extended time
     • Linkage, context,               • Budget and policy
       metadata                          impacts
     • Authenticity, integrity,
                                       • People issues!
       provenance


EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




           Science and curation
     • Creating and managing data suitable for re-use
     • Good curation supports good science (managing
       your data properly)
        • Poor curation allows sloppy science?
     • Data curation should save money
        • Murray-Rust/Frey on interesting but fruitless experiments!
     • Some science impossible without curation…
        • QCD strong coupling constant prediction (Bethke)
        • Viscosity of earth mantle from Shang Dynasty eclipse
          records (Pang et al)
        • Science depending on past baselines (eg environmental,
          social sciences)

EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




            Records of science
     • Data increasingly important as evidence
        • Key part of the scholarly record (public good)
           • Unrepeatable observations & experiments
        • Experimental verifiability (the basis of science)
           • Would Chang retractions have been reduced if his first
             data were available?
        • Allows additional interpretations
        • Legal and compliance
           • See APSR/AERES report for good examples



EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




           What kinds of data?
     • Observations
        • eg UARS (Upper Atmosphere) Level 0: telemetry
        • UARS Level 1: measured physical parameters (post
          calibration?)
     • Derived data
        • UARS Level 2: calculated geophysical? profiles
        • UARS level 3: gridded, interpolated?
     • Combined data
     • Crafted data
        • Eg annotated gene/protein databases
     • Descriptive (meta)data


EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




Retaining research data means…
     • Data secure against loss (within group)
     • Communal repository (secure bit dump)
     • Re-usable, sharable information
     • As above, plus active curation (eg bio-
       informatics)
     • Long term preservation of information

     • Be clear what you are trying to do!

EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




     … or the data trajectory is…
     • Hard drive → lost (crash)
     • Hard drive →DVD →Cardboard box →Loft
       →Skip/dumpster → lost

     • Sometimes this is a very bad thing
     • Sometimes these are the right options!




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




        Long term bit storage…
     • A solved problem? Just requires well-
       understood good data management
       practices?
     • Wrong! For very large datasets over very long
       time, there are significant problems…




                 BAKER, M., SHAH, M., ROSENTHAL, D. S. H., ROUSSOPOLOUS, M., MANIATIS, P., GIULI, T.
                 J. & BUNGALE, P. (2006) A Fresh Look at the Reliability of Long-term Digital Storage. EuroSys
                 '06. Leuven, Belgium, ACM.

EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




   How Well Must We Preserve?
   Keep a petabyte for a century
   – With   50% chance of remaining completely undamaged

   Consider each bit decaying independently
   – Analogy   with radioactive decay

   That's a bit half- life of 10**18 years
   – One    hundred million times the age of the universe

   That's a very demanding requirement
   – Hard to measure
   – Even very unlikely faults will matter a lot

EDUCAUSE Australasia 2007 •Slide from David Rosenthal, LOCKSS
a centre of expertise in data curation and preservation




       What to do about curation
     • Build curation/reusability into your workflow
        • Curation begins before creation
        • What’s easy at first becomes (impossibly) hard
          later
        • Describe your data (metadata schemas,
          “representation info”, etc)
        • Keep experimental parameters (technical, who,
          what, when, where)
        • Keep ability to process
        • Keep data!


EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




    What to do about curation - 2
     • Use standard/agreed formats for data
     • Make ownership & restrictions clear, &
       explain how to cite your data
     • Offer for deposit in institutional or discipline
       repository
        • Appraisal and selection essential
        • Possible time-limited embargos
     • “Publish” data in support of articles



EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation



 Internet Archaeology: publication with
                 data




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




           Database as book…
     • Buneman (early pilot)
       work on IUPHAR
       database
     • MySQL to XML
       database
        • Historic to logical
          schema
     • XML via XSLT to LaTeX




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




                The StORe vision
     • Seamless transport                                 Source
       from research data to
       research publications
       and vice versa                                      ware
     • Bi-directional links                               Middle
       proven in social science
       e-research but capable
       of export to other
       disciplines
                                                          Output




                 •http://jiscstore.jot.com/WikiHome/
EDUCAUSE Australasia 2007                         •Slide from Graham Pryor
a centre of expertise in data curation and preservation




 What are the reusability issues?
     • Data not neutral to hypothesis
     • Hard to know the risks & pitfalls of a particular
       dataset
     • Data not self-describing: hard to find
       appropriate data (but see Murray-Rust on
       Googling InChi etc)
     • Hard to “understand” data once found
        • Really need information, not data!
     • Hard to use data once understood

EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




                     Context
     • Data meaningless without context
        • Metadata of many kinds
        • Representation information… from data to
          information
        • Linkage and connection between datasets
        • Use your workflow!
     • Provenance
        • Authenticity/integrity
        • Computational lineage



EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation



                                              NASA



 Csat8-day composite and subsceneCsat 8-day composite subscene PAR subscene RPT
                                  E0SST            and    Pbopt calc      H
                                                                       Ctot calc Zeu calc PPeu calc




                                           University                                                                           research
                                                                                                         University              group3       local
                                           research
                                                                                                         research                           decision-
                                            group1
                                                                                                          group2                           making body



EDUCAUSE Australasia 2007                                                                                                Slide from Rajendra Bose
a centre of expertise in data curation and preservation




            Access and re-use
     • Ethics and rights control access
        • Weak in expressing this long-term
     • Collaboration tools
        • Annotation, discussion, review (see DART…)
        • Re-use leading to change and development
     • “Publication”
        • Not just in “print”
        • Underlying data should be “published”, too


EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




      Database citation issues…
     • Citation for human readers and machine use cases
     • Granularity: database, record, item
     • Citation of changing objects
        • Version change (eg W3C practice: no version = latest, vs bibliographic:
          no version = first)
        • An efficient way to reference and access “archived” past states of
          more rapidly changing dataset, eg Genomics… datasets that result
          from the combined work of curators, or contain opinions or facts likely
          to change (work in progress, Buneman et al)
     • Standards conflict and immature (NLM best?)

     • Citation ESSENTIAL for motivating quality academic work on data
       management and curation


EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




            Who does curation?
     •   Individuals
     •   Departments or groups
     •   Institutions, maybe through libraries
     •   Communities
     •   Disciplines
     •   Publishers
     •   National services
     •   Other 3rd parties…

EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




            Curation: Individual
     • “Small science 2-3 times more data than Big
       science”, but much more at risk
     • PhD student? RA? PI? Administrator? IT support?
     • Data potentially on local hard drives, or at best
       shared network drives
        • May be inadequately protected
        • Liable for policy-led deletion on resignation
     • Individual “knows” too much (tacit knowledge)
        • Documentation/metadata unlikely to be adequate
     • Future: gone!

EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




            Curation: Individual




EDUCAUSE Australasia 2007                                          •© Marita Bushell
a centre of expertise in data curation and preservation




         Department: eCrystals
                                       •   Partnership with Institutional
                                           Repository
                                       •   Specialist department
                                           archive (& national service)
                                       •   Workflow recording of lab
                                           parameters (R4L)
                                       •   Public & private elements
                                       •   Trying to build eCrystals
                                           federation (eBank 3)
                                       •   Future: likely to continue




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




  Data in institutional repositories




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




    Institution: Cambridge Chemistry
                                         •   175,000 small molecule
                                             structures in CML
                                         •   Alongside Archaeology,
                                             Manuscripts, Learning
                                             Materials, etc
                                         •   No library curation skills;
                                             dependent on research
                                             group enthusiast
                                         •   Collection isolated from
                                             other Chemistry
                                         •   (Only 5 UK institutional
                                             repositories claim to hold
                                             data)
                                         •   Future: assured…

EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




         Community: LOCKSS?
                                         •   Self-selected group of
                                             collectors: closest to genuine
                                             open activity (despite
                                             Alliance)?
                                         •   Traditionally libraries
                                             collecting eJournals
                                         •   Model respects IPR
                                         •   No domain expertise; rely on
                                             origins
                                         •   Data limitations…
                                         •   Future: potentially very
                                             persistent (low cost, high
                                             reliability, attack resistance,
                                             distributed)

EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




 Discipline: Atmospheric Science
                                         • Strong believer in need
                                           for domain scientists as
                                           curators
                                         • Significant participant in
                                           “community proxy”
                                           agenda-setting activities
                                         • Internationally
                                           fragmented resources
                                         • Future: mostly
                                           dependent on grant
                                           funding (but strong
                                           commitment)

EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




       Discipline: Pharmacology
                                         • International Scientific
                                           Union
                                         • Attempting to build
                                           credit for data
                                           contributions
                                         • Future: extremely
                                           limited funding




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




  Bio-informatics: Nature article
            23 June 05
    • Databases in Peril
        • 51 out of 89 biological databases contacted reported they
          were struggling financially
        • 7 have closed
        • Several being updated in owner’s spare time
        • (Notes that not all deserve long term support)

    • [Nucleic Acids Research reports 968 databases in
      2007!]
    • Major issue: money


EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation



      Publisher: Crystallography
                                         •   Publisher and Scientific
                                             Union
                                         •   Created key domain
                                             crystallographic standard
                                             (CIF)
                                         •   Strong motivator for deposit
                                             of structure data
                                         •   Consistent quality checks
                                         •   DOIs used for structure data
                                         •   Future: publishing business
                                             model



EDUCAUSE Australasia 2007                                 •Slide from IUCr
a centre of expertise in data curation and preservation




   National bodies: British Library
                                         • Serious and robust
                                           approach
                                         • Legal deposit powers &
                                           responsibilities as driver
                                         • Oriented primarily
                                           towards “cultural
                                           heritage” (broadly
                                           interpreted)
                                         • Little data, no science
                                           domain experience
                                         • Future: strong future
                                           commitment

EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




    National bodies: TNA/NDAD
                                         •   Specialist archive for
                                             government datasets
                                         •   Understand government
                                             regulations, dynamics &
                                             requirements
                                         •   Subject generalists;
                                             disconnected from
                                             associated science
                                         •   Technology specialists
                                             (understand databases)
                                         •   Future: likely to pass
                                             eventually to The National
                                             Archives



EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




            3rd parties: Portico
                                         • Specific area: eJournals
                                         • Depends on publisher
                                           agreements
                                         • No data or domain
                                           science expertise
                                         • Future: commitment
                                           from Mellon +
                                           publishers +
                                           subscriptions, good
                                           funding mix


EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




     3rd Parties: Iron Mountain?
                                         • Records management
                                           IS a curation problem
                                         • Organisations like this
                                           very likely to branch out
                                         • No domain science
                                           expertise
                                         • Future: business case,
                                           viability, stock market…




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




      3rd parties: Web 2.0 style,
            Swivel.com??




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




       Institutions & the network
     • Institutions have                              Inst’   Inst’n   Inst’n
                                                       n1       2        3
       fundamental
       sustainability               Discipline 1       X                  X
     • Disciplines have domain
       knowledge advantage          Discipline 2                X         X
       but sustainability is an
       issue
                                    Discipline 3       X        X
     • Can we get the best of
       both?
     • Needs serious work to             etc

       examine!


EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




   Who are the curation players?




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




               Cultural change
     • If we build it, will they come? NO!!
     • Outreach important: communication with
       scientists and researchers is hard graft
     • Cultural change to new approach requires more:
        • Incentives, rewards and mandates
        • Successful exemplars (well publicised)
        • Discipline-oriented approach (one size does not fit all)




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




            Australian context?
     • In the emerging context of the Research
       Quality Framework, and the expected
       National Collaborative Research
       Infrastructure Strategy, curation can only
       increase in importance!




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




                            Thank you




                               •(Citations in paper in proceedings)
EDUCAUSE Australasia 2007

Create, curate, re-use: the expanding life course of digital research data

  • 1.
    a centre ofexpertise in data curation and preservation Create, curate, re-use: the expanding life course of digital research data Chris Rusbridge EDUCAUSE Australasia May 2007 Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License, excluding content property of others. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.
  • 2.
    a centre ofexpertise in data curation and preservation Contents • Science and digital curation • Why are data important? • What kinds of data? • What to do with your data: frontiers of practice • Repository frontiers • Changing practice EDUCAUSE Australasia 2007
  • 3.
    a centre ofexpertise in data curation and preservation Digital Curation Centre Mission “The over-riding purpose of the DCC is to support and promote continuing improvement in the quality of data curation, and of associated digital preservation” EDUCAUSE Australasia 2007
  • 4.
    a centre ofexpertise in data curation and preservation EDUCAUSE Australasia 2007
  • 5.
    a centre ofexpertise in data curation and preservation Summarising… • Sustainability • Maintaining meaning • Creation or selection over time • Growth, development • Preserving, including • Making available past states • Access management • De-selection… • Re-usability • Extended time • Linkage, context, • Budget and policy metadata impacts • Authenticity, integrity, • People issues! provenance EDUCAUSE Australasia 2007
  • 6.
    a centre ofexpertise in data curation and preservation Science and curation • Creating and managing data suitable for re-use • Good curation supports good science (managing your data properly) • Poor curation allows sloppy science? • Data curation should save money • Murray-Rust/Frey on interesting but fruitless experiments! • Some science impossible without curation… • QCD strong coupling constant prediction (Bethke) • Viscosity of earth mantle from Shang Dynasty eclipse records (Pang et al) • Science depending on past baselines (eg environmental, social sciences) EDUCAUSE Australasia 2007
  • 7.
    a centre ofexpertise in data curation and preservation Records of science • Data increasingly important as evidence • Key part of the scholarly record (public good) • Unrepeatable observations & experiments • Experimental verifiability (the basis of science) • Would Chang retractions have been reduced if his first data were available? • Allows additional interpretations • Legal and compliance • See APSR/AERES report for good examples EDUCAUSE Australasia 2007
  • 8.
    a centre ofexpertise in data curation and preservation What kinds of data? • Observations • eg UARS (Upper Atmosphere) Level 0: telemetry • UARS Level 1: measured physical parameters (post calibration?) • Derived data • UARS Level 2: calculated geophysical? profiles • UARS level 3: gridded, interpolated? • Combined data • Crafted data • Eg annotated gene/protein databases • Descriptive (meta)data EDUCAUSE Australasia 2007
  • 9.
    a centre ofexpertise in data curation and preservation Retaining research data means… • Data secure against loss (within group) • Communal repository (secure bit dump) • Re-usable, sharable information • As above, plus active curation (eg bio- informatics) • Long term preservation of information • Be clear what you are trying to do! EDUCAUSE Australasia 2007
  • 10.
    a centre ofexpertise in data curation and preservation … or the data trajectory is… • Hard drive → lost (crash) • Hard drive →DVD →Cardboard box →Loft →Skip/dumpster → lost • Sometimes this is a very bad thing • Sometimes these are the right options! EDUCAUSE Australasia 2007
  • 11.
    a centre ofexpertise in data curation and preservation Long term bit storage… • A solved problem? Just requires well- understood good data management practices? • Wrong! For very large datasets over very long time, there are significant problems… BAKER, M., SHAH, M., ROSENTHAL, D. S. H., ROUSSOPOLOUS, M., MANIATIS, P., GIULI, T. J. & BUNGALE, P. (2006) A Fresh Look at the Reliability of Long-term Digital Storage. EuroSys '06. Leuven, Belgium, ACM. EDUCAUSE Australasia 2007
  • 12.
    a centre ofexpertise in data curation and preservation How Well Must We Preserve? Keep a petabyte for a century – With 50% chance of remaining completely undamaged Consider each bit decaying independently – Analogy with radioactive decay That's a bit half- life of 10**18 years – One hundred million times the age of the universe That's a very demanding requirement – Hard to measure – Even very unlikely faults will matter a lot EDUCAUSE Australasia 2007 •Slide from David Rosenthal, LOCKSS
  • 13.
    a centre ofexpertise in data curation and preservation What to do about curation • Build curation/reusability into your workflow • Curation begins before creation • What’s easy at first becomes (impossibly) hard later • Describe your data (metadata schemas, “representation info”, etc) • Keep experimental parameters (technical, who, what, when, where) • Keep ability to process • Keep data! EDUCAUSE Australasia 2007
  • 14.
    a centre ofexpertise in data curation and preservation What to do about curation - 2 • Use standard/agreed formats for data • Make ownership & restrictions clear, & explain how to cite your data • Offer for deposit in institutional or discipline repository • Appraisal and selection essential • Possible time-limited embargos • “Publish” data in support of articles EDUCAUSE Australasia 2007
  • 15.
    a centre ofexpertise in data curation and preservation Internet Archaeology: publication with data EDUCAUSE Australasia 2007
  • 16.
    a centre ofexpertise in data curation and preservation Database as book… • Buneman (early pilot) work on IUPHAR database • MySQL to XML database • Historic to logical schema • XML via XSLT to LaTeX EDUCAUSE Australasia 2007
  • 17.
    a centre ofexpertise in data curation and preservation The StORe vision • Seamless transport Source from research data to research publications and vice versa ware • Bi-directional links Middle proven in social science e-research but capable of export to other disciplines Output •http://jiscstore.jot.com/WikiHome/ EDUCAUSE Australasia 2007 •Slide from Graham Pryor
  • 18.
    a centre ofexpertise in data curation and preservation What are the reusability issues? • Data not neutral to hypothesis • Hard to know the risks & pitfalls of a particular dataset • Data not self-describing: hard to find appropriate data (but see Murray-Rust on Googling InChi etc) • Hard to “understand” data once found • Really need information, not data! • Hard to use data once understood EDUCAUSE Australasia 2007
  • 19.
    a centre ofexpertise in data curation and preservation Context • Data meaningless without context • Metadata of many kinds • Representation information… from data to information • Linkage and connection between datasets • Use your workflow! • Provenance • Authenticity/integrity • Computational lineage EDUCAUSE Australasia 2007
  • 20.
    a centre ofexpertise in data curation and preservation NASA Csat8-day composite and subsceneCsat 8-day composite subscene PAR subscene RPT E0SST and Pbopt calc H Ctot calc Zeu calc PPeu calc University research University group3 local research research decision- group1 group2 making body EDUCAUSE Australasia 2007 Slide from Rajendra Bose
  • 21.
    a centre ofexpertise in data curation and preservation Access and re-use • Ethics and rights control access • Weak in expressing this long-term • Collaboration tools • Annotation, discussion, review (see DART…) • Re-use leading to change and development • “Publication” • Not just in “print” • Underlying data should be “published”, too EDUCAUSE Australasia 2007
  • 22.
    a centre ofexpertise in data curation and preservation Database citation issues… • Citation for human readers and machine use cases • Granularity: database, record, item • Citation of changing objects • Version change (eg W3C practice: no version = latest, vs bibliographic: no version = first) • An efficient way to reference and access “archived” past states of more rapidly changing dataset, eg Genomics… datasets that result from the combined work of curators, or contain opinions or facts likely to change (work in progress, Buneman et al) • Standards conflict and immature (NLM best?) • Citation ESSENTIAL for motivating quality academic work on data management and curation EDUCAUSE Australasia 2007
  • 23.
    a centre ofexpertise in data curation and preservation Who does curation? • Individuals • Departments or groups • Institutions, maybe through libraries • Communities • Disciplines • Publishers • National services • Other 3rd parties… EDUCAUSE Australasia 2007
  • 24.
    a centre ofexpertise in data curation and preservation Curation: Individual • “Small science 2-3 times more data than Big science”, but much more at risk • PhD student? RA? PI? Administrator? IT support? • Data potentially on local hard drives, or at best shared network drives • May be inadequately protected • Liable for policy-led deletion on resignation • Individual “knows” too much (tacit knowledge) • Documentation/metadata unlikely to be adequate • Future: gone! EDUCAUSE Australasia 2007
  • 25.
    a centre ofexpertise in data curation and preservation Curation: Individual EDUCAUSE Australasia 2007 •© Marita Bushell
  • 26.
    a centre ofexpertise in data curation and preservation Department: eCrystals • Partnership with Institutional Repository • Specialist department archive (& national service) • Workflow recording of lab parameters (R4L) • Public & private elements • Trying to build eCrystals federation (eBank 3) • Future: likely to continue EDUCAUSE Australasia 2007
  • 27.
    a centre ofexpertise in data curation and preservation Data in institutional repositories EDUCAUSE Australasia 2007
  • 28.
    a centre ofexpertise in data curation and preservation Institution: Cambridge Chemistry • 175,000 small molecule structures in CML • Alongside Archaeology, Manuscripts, Learning Materials, etc • No library curation skills; dependent on research group enthusiast • Collection isolated from other Chemistry • (Only 5 UK institutional repositories claim to hold data) • Future: assured… EDUCAUSE Australasia 2007
  • 29.
    a centre ofexpertise in data curation and preservation Community: LOCKSS? • Self-selected group of collectors: closest to genuine open activity (despite Alliance)? • Traditionally libraries collecting eJournals • Model respects IPR • No domain expertise; rely on origins • Data limitations… • Future: potentially very persistent (low cost, high reliability, attack resistance, distributed) EDUCAUSE Australasia 2007
  • 30.
    a centre ofexpertise in data curation and preservation Discipline: Atmospheric Science • Strong believer in need for domain scientists as curators • Significant participant in “community proxy” agenda-setting activities • Internationally fragmented resources • Future: mostly dependent on grant funding (but strong commitment) EDUCAUSE Australasia 2007
  • 31.
    a centre ofexpertise in data curation and preservation Discipline: Pharmacology • International Scientific Union • Attempting to build credit for data contributions • Future: extremely limited funding EDUCAUSE Australasia 2007
  • 32.
    a centre ofexpertise in data curation and preservation Bio-informatics: Nature article 23 June 05 • Databases in Peril • 51 out of 89 biological databases contacted reported they were struggling financially • 7 have closed • Several being updated in owner’s spare time • (Notes that not all deserve long term support) • [Nucleic Acids Research reports 968 databases in 2007!] • Major issue: money EDUCAUSE Australasia 2007
  • 33.
    a centre ofexpertise in data curation and preservation Publisher: Crystallography • Publisher and Scientific Union • Created key domain crystallographic standard (CIF) • Strong motivator for deposit of structure data • Consistent quality checks • DOIs used for structure data • Future: publishing business model EDUCAUSE Australasia 2007 •Slide from IUCr
  • 34.
    a centre ofexpertise in data curation and preservation National bodies: British Library • Serious and robust approach • Legal deposit powers & responsibilities as driver • Oriented primarily towards “cultural heritage” (broadly interpreted) • Little data, no science domain experience • Future: strong future commitment EDUCAUSE Australasia 2007
  • 35.
    a centre ofexpertise in data curation and preservation National bodies: TNA/NDAD • Specialist archive for government datasets • Understand government regulations, dynamics & requirements • Subject generalists; disconnected from associated science • Technology specialists (understand databases) • Future: likely to pass eventually to The National Archives EDUCAUSE Australasia 2007
  • 36.
    a centre ofexpertise in data curation and preservation 3rd parties: Portico • Specific area: eJournals • Depends on publisher agreements • No data or domain science expertise • Future: commitment from Mellon + publishers + subscriptions, good funding mix EDUCAUSE Australasia 2007
  • 37.
    a centre ofexpertise in data curation and preservation 3rd Parties: Iron Mountain? • Records management IS a curation problem • Organisations like this very likely to branch out • No domain science expertise • Future: business case, viability, stock market… EDUCAUSE Australasia 2007
  • 38.
    a centre ofexpertise in data curation and preservation 3rd parties: Web 2.0 style, Swivel.com?? EDUCAUSE Australasia 2007
  • 39.
    a centre ofexpertise in data curation and preservation Institutions & the network • Institutions have Inst’ Inst’n Inst’n n1 2 3 fundamental sustainability Discipline 1 X X • Disciplines have domain knowledge advantage Discipline 2 X X but sustainability is an issue Discipline 3 X X • Can we get the best of both? • Needs serious work to etc examine! EDUCAUSE Australasia 2007
  • 40.
    a centre ofexpertise in data curation and preservation Who are the curation players? EDUCAUSE Australasia 2007
  • 41.
    a centre ofexpertise in data curation and preservation Cultural change • If we build it, will they come? NO!! • Outreach important: communication with scientists and researchers is hard graft • Cultural change to new approach requires more: • Incentives, rewards and mandates • Successful exemplars (well publicised) • Discipline-oriented approach (one size does not fit all) EDUCAUSE Australasia 2007
  • 42.
    a centre ofexpertise in data curation and preservation Australian context? • In the emerging context of the Research Quality Framework, and the expected National Collaborative Research Infrastructure Strategy, curation can only increase in importance! EDUCAUSE Australasia 2007
  • 43.
    a centre ofexpertise in data curation and preservation Thank you •(Citations in paper in proceedings) EDUCAUSE Australasia 2007