• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Create, curate, re-use: the expanding life course of digital research data
 

Create, curate, re-use: the expanding life course of digital research data

on

  • 327 views

Presentation to Educause Australasia 2007

Presentation to Educause Australasia 2007

Statistics

Views

Total Views
327
Views on SlideShare
327
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Create, curate, re-use: the expanding life course of digital research data Create, curate, re-use: the expanding life course of digital research data Presentation Transcript

    • a centre of expertise in data curation and preservation Create, curate, re-use:the expanding life course of digital research data Chris Rusbridge EDUCAUSE Australasia May 2007 Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License, excluding content property of others. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.
    • a centre of expertise in data curation and preservation Contents • Science and digital curation • Why are data important? • What kinds of data? • What to do with your data: frontiers of practice • Repository frontiers • Changing practiceEDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservation Digital Curation Centre Mission “The over-riding purpose of the DCC is to support and promote continuing improvement in the quality of data curation, and of associated digital preservation”EDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservationEDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservation Summarising… • Sustainability • Maintaining meaning • Creation or selection over time • Growth, development • Preserving, including • Making available past states • Access management • De-selection… • Re-usability • Extended time • Linkage, context, • Budget and policy metadata impacts • Authenticity, integrity, • People issues! provenanceEDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservation Science and curation • Creating and managing data suitable for re-use • Good curation supports good science (managing your data properly) • Poor curation allows sloppy science? • Data curation should save money • Murray-Rust/Frey on interesting but fruitless experiments! • Some science impossible without curation… • QCD strong coupling constant prediction (Bethke) • Viscosity of earth mantle from Shang Dynasty eclipse records (Pang et al) • Science depending on past baselines (eg environmental, social sciences)EDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservation Records of science • Data increasingly important as evidence • Key part of the scholarly record (public good) • Unrepeatable observations & experiments • Experimental verifiability (the basis of science) • Would Chang retractions have been reduced if his first data were available? • Allows additional interpretations • Legal and compliance • See APSR/AERES report for good examplesEDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservation What kinds of data? • Observations • eg UARS (Upper Atmosphere) Level 0: telemetry • UARS Level 1: measured physical parameters (post calibration?) • Derived data • UARS Level 2: calculated geophysical? profiles • UARS level 3: gridded, interpolated? • Combined data • Crafted data • Eg annotated gene/protein databases • Descriptive (meta)dataEDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservationRetaining research data means… • Data secure against loss (within group) • Communal repository (secure bit dump) • Re-usable, sharable information • As above, plus active curation (eg bio- informatics) • Long term preservation of information • Be clear what you are trying to do!EDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservation … or the data trajectory is… • Hard drive → lost (crash) • Hard drive →DVD →Cardboard box →Loft →Skip/dumpster → lost • Sometimes this is a very bad thing • Sometimes these are the right options!EDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservation Long term bit storage… • A solved problem? Just requires well- understood good data management practices? • Wrong! For very large datasets over very long time, there are significant problems… BAKER, M., SHAH, M., ROSENTHAL, D. S. H., ROUSSOPOLOUS, M., MANIATIS, P., GIULI, T. J. & BUNGALE, P. (2006) A Fresh Look at the Reliability of Long-term Digital Storage. EuroSys 06. Leuven, Belgium, ACM.EDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservation How Well Must We Preserve? Keep a petabyte for a century – With 50% chance of remaining completely undamaged Consider each bit decaying independently – Analogy with radioactive decay Thats a bit half- life of 10**18 years – One hundred million times the age of the universe Thats a very demanding requirement – Hard to measure – Even very unlikely faults will matter a lotEDUCAUSE Australasia 2007 •Slide from David Rosenthal, LOCKSS
    • a centre of expertise in data curation and preservation What to do about curation • Build curation/reusability into your workflow • Curation begins before creation • What’s easy at first becomes (impossibly) hard later • Describe your data (metadata schemas, “representation info”, etc) • Keep experimental parameters (technical, who, what, when, where) • Keep ability to process • Keep data!EDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservation What to do about curation - 2 • Use standard/agreed formats for data • Make ownership & restrictions clear, & explain how to cite your data • Offer for deposit in institutional or discipline repository • Appraisal and selection essential • Possible time-limited embargos • “Publish” data in support of articlesEDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservation Internet Archaeology: publication with dataEDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservation Database as book… • Buneman (early pilot) work on IUPHAR database • MySQL to XML database • Historic to logical schema • XML via XSLT to LaTeXEDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservation The StORe vision • Seamless transport Source from research data to research publications and vice versa ware • Bi-directional links Middle proven in social science e-research but capable of export to other disciplines Output •http://jiscstore.jot.com/WikiHome/EDUCAUSE Australasia 2007 •Slide from Graham Pryor
    • a centre of expertise in data curation and preservation What are the reusability issues? • Data not neutral to hypothesis • Hard to know the risks & pitfalls of a particular dataset • Data not self-describing: hard to find appropriate data (but see Murray-Rust on Googling InChi etc) • Hard to “understand” data once found • Really need information, not data! • Hard to use data once understoodEDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservation Context • Data meaningless without context • Metadata of many kinds • Representation information… from data to information • Linkage and connection between datasets • Use your workflow! • Provenance • Authenticity/integrity • Computational lineageEDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservation NASA Csat8-day composite and subsceneCsat 8-day composite subscene PAR subscene RPT E0SST and Pbopt calc H Ctot calc Zeu calc PPeu calc University research University group3 local research research decision- group1 group2 making bodyEDUCAUSE Australasia 2007 Slide from Rajendra Bose
    • a centre of expertise in data curation and preservation Access and re-use • Ethics and rights control access • Weak in expressing this long-term • Collaboration tools • Annotation, discussion, review (see DART…) • Re-use leading to change and development • “Publication” • Not just in “print” • Underlying data should be “published”, tooEDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservation Database citation issues… • Citation for human readers and machine use cases • Granularity: database, record, item • Citation of changing objects • Version change (eg W3C practice: no version = latest, vs bibliographic: no version = first) • An efficient way to reference and access “archived” past states of more rapidly changing dataset, eg Genomics… datasets that result from the combined work of curators, or contain opinions or facts likely to change (work in progress, Buneman et al) • Standards conflict and immature (NLM best?) • Citation ESSENTIAL for motivating quality academic work on data management and curationEDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservation Who does curation? • Individuals • Departments or groups • Institutions, maybe through libraries • Communities • Disciplines • Publishers • National services • Other 3rd parties…EDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservation Curation: Individual • “Small science 2-3 times more data than Big science”, but much more at risk • PhD student? RA? PI? Administrator? IT support? • Data potentially on local hard drives, or at best shared network drives • May be inadequately protected • Liable for policy-led deletion on resignation • Individual “knows” too much (tacit knowledge) • Documentation/metadata unlikely to be adequate • Future: gone!EDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservation Curation: IndividualEDUCAUSE Australasia 2007 •© Marita Bushell
    • a centre of expertise in data curation and preservation Department: eCrystals • Partnership with Institutional Repository • Specialist department archive (& national service) • Workflow recording of lab parameters (R4L) • Public & private elements • Trying to build eCrystals federation (eBank 3) • Future: likely to continueEDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservation Data in institutional repositoriesEDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservation Institution: Cambridge Chemistry • 175,000 small molecule structures in CML • Alongside Archaeology, Manuscripts, Learning Materials, etc • No library curation skills; dependent on research group enthusiast • Collection isolated from other Chemistry • (Only 5 UK institutional repositories claim to hold data) • Future: assured…EDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservation Community: LOCKSS? • Self-selected group of collectors: closest to genuine open activity (despite Alliance)? • Traditionally libraries collecting eJournals • Model respects IPR • No domain expertise; rely on origins • Data limitations… • Future: potentially very persistent (low cost, high reliability, attack resistance, distributed)EDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservation Discipline: Atmospheric Science • Strong believer in need for domain scientists as curators • Significant participant in “community proxy” agenda-setting activities • Internationally fragmented resources • Future: mostly dependent on grant funding (but strong commitment)EDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservation Discipline: Pharmacology • International Scientific Union • Attempting to build credit for data contributions • Future: extremely limited fundingEDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservation Bio-informatics: Nature article 23 June 05 • Databases in Peril • 51 out of 89 biological databases contacted reported they were struggling financially • 7 have closed • Several being updated in owner’s spare time • (Notes that not all deserve long term support) • [Nucleic Acids Research reports 968 databases in 2007!] • Major issue: moneyEDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservation Publisher: Crystallography • Publisher and Scientific Union • Created key domain crystallographic standard (CIF) • Strong motivator for deposit of structure data • Consistent quality checks • DOIs used for structure data • Future: publishing business modelEDUCAUSE Australasia 2007 •Slide from IUCr
    • a centre of expertise in data curation and preservation National bodies: British Library • Serious and robust approach • Legal deposit powers & responsibilities as driver • Oriented primarily towards “cultural heritage” (broadly interpreted) • Little data, no science domain experience • Future: strong future commitmentEDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservation National bodies: TNA/NDAD • Specialist archive for government datasets • Understand government regulations, dynamics & requirements • Subject generalists; disconnected from associated science • Technology specialists (understand databases) • Future: likely to pass eventually to The National ArchivesEDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservation 3rd parties: Portico • Specific area: eJournals • Depends on publisher agreements • No data or domain science expertise • Future: commitment from Mellon + publishers + subscriptions, good funding mixEDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservation 3rd Parties: Iron Mountain? • Records management IS a curation problem • Organisations like this very likely to branch out • No domain science expertise • Future: business case, viability, stock market…EDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservation 3rd parties: Web 2.0 style, Swivel.com??EDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservation Institutions & the network • Institutions have Inst’ Inst’n Inst’n n1 2 3 fundamental sustainability Discipline 1 X X • Disciplines have domain knowledge advantage Discipline 2 X X but sustainability is an issue Discipline 3 X X • Can we get the best of both? • Needs serious work to etc examine!EDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservation Who are the curation players?EDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservation Cultural change • If we build it, will they come? NO!! • Outreach important: communication with scientists and researchers is hard graft • Cultural change to new approach requires more: • Incentives, rewards and mandates • Successful exemplars (well publicised) • Discipline-oriented approach (one size does not fit all)EDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservation Australian context? • In the emerging context of the Research Quality Framework, and the expected National Collaborative Research Infrastructure Strategy, curation can only increase in importance!EDUCAUSE Australasia 2007
    • a centre of expertise in data curation and preservation Thank you •(Citations in paper in proceedings)EDUCAUSE Australasia 2007