Your SlideShare is downloading. ×
0
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Dcc endeavour-2006
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Dcc endeavour-2006

216

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
216
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Initially we have concentrated on data extracted from relational databases, mainly because this is where the IUPHAR data is. 1) Extract to XML (friendly hierarchical format). 2) Next we want to merge with the archive containing the previous versions. 3) Process and Merge 4) New archive with latest version added. Demo ....
  • Transcript

    • 1. a centre of expertise in data curation and preservationExperience is a hard teacher… Curation and the Digital Record Chris Rusbridge Endeavor EndUser 2006 Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by- nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.
    • 2. a centre of expertise in data curation and preservation "Experience is a hard teacher because she gives the test first, the lesson afterwards” • Vernon Sanders Law, ex baseball player • (Or perhaps, in the case of digital preservation, the test occurs long after you are dead?)Endeavor EndUser 2006
    • 3. a centre of expertise in data curation and preservation Contents • Curation • Sustainability • Data resources • Preservation & curation issues • OAIS ReviewEndeavor EndUser 2006
    • 4. a centre of expertise in data curation and preservation Curation • Data increasingly important as evidence • Experimental verifiability (the basis of science) • Unrepeatable observations & experiments (particularly environmental in broadest sense) • Legal, compliance & transactions • Cultural resources • For evidential value, data must be curatedEndeavor EndUser 2006
    • 5. a centre of expertise in data curation and preservation Curation • “Maintaining and adding value to a trusted body of digital information for current and future use”Endeavor EndUser 2006
    • 6. a centre of expertise in data curation and preservation Lynch remarks • Closing the 2005 Curation Conference • 3 views of digital curation • Collection as a living thing • Whole life process, evolving object(s) • Finite process, handover to preservationEndeavor EndUser 2006
    • 7. a centre of expertise in data curation and preservationEndeavor EndUser 2006
    • 8. a centre of expertise in data curation and preservation •This is what you do!Endeavor EndUser 2006
    • 9. a centre of expertise in data curation and preservation Sustainability and exit strategy • Most critical resource for curation: present and future money supply! • Plan for the long term, but have a succession plan • Sustained approach not project mentalityEndeavor EndUser 2006
    • 10. a centre of expertise in data curation and preservation Sustainability and exit strategy • Most critical resource for curation: present and future money supply! • Plan for the long term, but have a succession plan • Sustained approach not project mentality •This is what you do!Endeavor EndUser 2006
    • 11. a centre of expertise in data curation and preservation Some illustrations: UK census • 1881 census (UKDA) • Hand-written individual return forms: data conversion issue (reference form available): digitisation and access issues • 1961 census (TNA/NDAD) • First using computers to analyse (first major UK-wide computer project?); individual returns closed until 2062: data preservation issue!!! • 2001 census (ONS/CDU) • Data corrections and adjustments: curation issueEndeavor EndUser 2006
    • 12. a centre of expertise in data curation and preservation Curation of emails  Lots of metadata and context (RFC 822)  Often highly distributed  Split conversations  Unknown numbers of copies  Personal choice of clients • Legal requirements! • Controlled filing and controlled deletion needed…Endeavor EndUser 2006
    • 13. a centre of expertise in data curation and preservationEndeavor EndUser 2006
    • 14. a centre of expertise in data curation and preservation Online Public Access Catalogues • Long term, curated databases • Often high quality (not always) • Well known interchange standards (MARC), classification standards (several), name authorities… • Still significant problems combining sourcesEndeavor EndUser 2006
    • 15. a centre of expertise in data curation and preservationEndeavor EndUser 2006
    • 16. a centre of expertise in data curation and preservationEndeavor EndUser 2006
    • 17. a centre of expertise in data curation and preservation Pre 2000 © Chris Rusbridge … The database provides indexing of 527 … The database provides indexing of 527 key international English-language business key international English-language business periodicals including Business Week, periodicals including Business Week, Forbes, The Wall Street Journal, The New Forbes, The Wall Street Journal, The New York Times and more. Also included are York Times and more. Also included are product reviews, interviews, biographical product reviews, interviews, biographical Post 2000 sketches, corporate profiles, reports of sketches, corporate profiles, reports of associations, societies and conferences. associations, societies and conferences. Broad areas of coverage include Broad areas of coverage include accounting, acquisitions and mergers, accounting, acquisitions and mergers, advertising, banking, chemicals, … advertising, banking, chemicals, …Endeavor EndUser 2006 •Peter Buneman
    • 18. a centre of expertise in data curation and preservation • Storage – Redundant, Distributed – Persistent – Readable • Clear standards for citation • Historical record (old data is useful) • Well understood ownership/IP © Chris Rusbridge … The database provides indexing of 527 … The database provides indexing of 527 key international English-language business key international English-language business•Storage periodicals including Business Week, periodicals including Business Week, –Single-source Forbes, The Wall Street Journal, The New Forbes, The Wall Street Journal, The New York Times and more. Also included are –Volatile York Times and more. Also included are product reviews, interviews, biographical product reviews, interviews, biographical –Centralised sketches, corporate profiles, reports of sketches, corporate profiles, reports of –Internal DBMS format associations, societies and conferences. associations, societies and conferences.•No standards for citation Broad areas of coverage include Broad areas of coverage include•No historical record accounting, acquisitions and mergers, accounting, acquisitions and mergers,•Mind-boggling legal issues advertising, banking, chemicals, … advertising, banking, chemicals, … Endeavor EndUser 2006 •Peter Buneman
    • 19. a centre of expertise in data curation and preservation TWOMASS (Infrared) SDSS (Visual)Endeavor EndUser 2006 Slide from Rajendra Bose
    • 20. a centre of expertise in data curation and preservationEndeavor EndUser 2006 Slide from Rajendra Bose
    • 21. a centre of expertise in data curation and preservation Example… • National Virtual Observatory • Johns Hopkins press release: “Scientists working to create the NVO, an online portal for astronomical research unifying dozens of large astronomical databases, confirmed discovery of [a] new brown dwarf recently. The star emerged from a computerized search of information on millions of astronomical objects in two separate astronomical databases. Thanks to an NVO prototype, that search, formerly an endeavor requiring weeks or months of human attention, took approximately two minutes.”Endeavor EndUser 2006
    • 22. a centre of expertise in data curation and preservation Context • Data meaningless without context • Linkage • Metadata of many kinds • Workflow! • Provenance • Computational lineage • AuthenticityEndeavor EndUser 2006
    • 23. a centre of expertise in data curation and preservation Access and re-use • Ethics and rights control access • Weak in expressing this long-term • Collaboration tools • Annotation, discussion, review • Re-use leading to change and development • “Publication” • Not just in “print” • Underlying data should be “published”, too • Citation…Endeavor EndUser 2006
    • 24. a centre of expertise in data curation and preservation Citation • Needs a stable resource to cite… OWL Web Ontology Language Reference W3C Proposed Recommendation 15 December 2003 This version: http://www.w3.org/TR/2003/PR-owl-ref-20031215/ Latest version: http://www.w3.org/TR/owl-ref/ Previous version: http://www.w3.org/TR/2003/CR-owl-ref-2003081Endeavor EndUser 2006
    • 25. a centre of expertise in data curation and preservation Citation… • The date alone (as in common web citation approaches) is not enough! •[6] The CIA World Factbook. •www.cia.gov/cia/publications/factbook/. •Retrieved on 8 Jan 2006. • Cited object likely to have changed… • Citation should link to the cited object as it was!Endeavor EndUser 2006
    • 26. a centre of expertise in data curation and preservation Citation needs… • An efficient way to reference and access “archived” past states of a changing dataset (work in progress, Buneman et al) • Less important for original observations • Don’t mess with those data • Less important for incremental datasets • Later stuff should not invalidate earlier • Very important for revisable datasets • Eg Genomics… datasets that result from the combined work of curators, or contain opinions or facts likely to changeEndeavor EndUser 2006
    • 27. a centre of expertise in data curation and preservation XML Archive at time t - 1XMLArch: System Architecture time t Relational XML Archiver XML Snapshot at Database Pre-processor Version Merger Data Extractor XML Archive at time tEndeavor EndUser 2006 •Carwyn Edwards
    • 28. a centre of expertise in data curation and preservation Preservation & curation • Use preserves • Money preserves • Redundancy good, monoculture bad? • LOCKSS-type & other approaches… • Bits are fragile and robust • Don’t rely on portable media • Look after them well • Technology changes… • How fast? What impact? • Metadata matters! (Know what you’ve got)Endeavor EndUser 2006
    • 29. a centre of expertise in data curation and preservation Formats, migration, significant properties… • “We MUST preserve the look and feel!” • Well… • Think about a book like “Kenilworth” by Walter Scott • Think about the BBC Domesday emulation • You may be better with a preserved “desiccated” version… than nothing at all!Endeavor EndUser 2006
    • 30. a centre of expertise in data curation and preservation The Project Gutenberg EBook of Kenilworth, by Sir Walter Scott This eBook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.org Title: Kenilworth Author: Sir Walter Scott Release Date: February 21, 2006 [EBook #1606] Language: English Character set encoding: ASCII *** START OF THIS PROJECT GUTENBERG EBOOK KENILWORTH *** Produced by An Anonymous Volunteer and David Widger KENILWORTH. by Sir Walter Scott, Bart. INTRODUCTION A certain degree of success, real or supposed, in the delineation of Queen Mary, naturally induced the author to attempt something similar respecting "her sister and her foe," the celebrated Elizabeth. He will not, however, pretend to have approached the task with the same feelings; for the candid Robertson himself confesses having felt the prejudices with which a Scottishman is tempted to regard the subject; …Endeavor EndUser 2006
    • 31. a centre of expertise in data curation and preservation •But there •ARE limits! QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.Endeavor EndUser 2006
    • 32. a centre of expertise in data curation and preservation Preservation is not cheap • But it’s not expensive…Endeavor EndUser 2006
    • 33. a centre of expertise in data curation and preservation Preservation is not cheap • But it’s not expensive… • Compared with the alternative!Endeavor EndUser 2006
    • 34. a centre of expertise in data curation and preservation Preservation is not cheap • But it’s not expensive… • Compared with the alternative! •Postcard •Sent to me •anonymouslyEndeavor EndUser 2006
    • 35. a centre of expertise in data curation and preservation Curation: whose job is it? • Yours! • With your archivists • And your Records Managers • And your scientists and scholars…Endeavor EndUser 2006
    • 36. a centre of expertise in data curation and preservation Preservation & curation • We can’t do it alone • Collective responsibility • We can’t rely on anyone else • Institutional responsibilityEndeavor EndUser 2006
    • 37. a centre of expertise in data curation and preservation It’s about time… • From the very short • Good management (don’t under-estimate but don’t over-estimate) • Through the medium term • Curation: use it or lose it • Gather ye metadata while ye may! • Preservation relay • To the very long term • High commitment, high cost, high risk • Harder to do en masseEndeavor EndUser 2006
    • 38. a centre of expertise in data curation and preservation Supplier role? • Work together with libraries… • Multi-supplier, Multi-platform • Open source mix • The library is not simple any more • Library 2.0? • Power of crowds, economy of attention, generation X… • Wikicat?Endeavor EndUser 2006
    • 39. a centre of expertise in data curation and preservation QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.Endeavor EndUser 2006
    • 40. a centre of expertise in data curation and preservation Supplier role? • Work together with libraries… • Multi-supplier, Multi-platform • Open source mix • The library is not simple any more • Library 2.0? • Power of crowds, economy of attention, generation X… • Wikicat? • Web 2.0? • Mix, mashup • What you see is… not there?Endeavor EndUser 2006
    • 41. a centre of expertise in data curation and preservation BEWARE WEB 2.0!!! QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.Endeavor EndUser 2006
    • 42. a centre of expertise in data curation and preservation OAIS • “Announcement of a Comment Period for the Five Year Review of the Reference Model for an Open Archival Information System (OAIS) Standard” • “… must be reviewed every five years and a determination made to reaffirm, modify, or withdraw the existing standard.” • “…any revision must remain backward compatible with regard to major terminology and concepts.” • “… we do not plan to expand the general level of detail” • “… reduce ambiguities and fill in any missing or weak concepts” • Make suggestions and express interest until 30/10/06 • OAIS-support@delight.gsfc.nasa.govEndeavor EndUser 2006
    • 43. a centre of expertise in data curation and preservation To close… • Your library is currently taking the curation test… • Your children will learn the answer! • ButEndeavor EndUser 2006

    ×