Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
MAKING SENSE OF
A COLLECTION
This work is licensed under a
Creative Commons Attribution 2.0
UK: England & Wales License
Ga...
Case Studies
National service that preserved
research, teaching and learning
resources in arts & humanities
between 1996 -...
Need for Digital Preservation
Data Storage
media
Computing
device
Operating
System
Software
application
Information
+ + + ...
Climb the preservation mountain
“the series of managed activities necessary to ensure continued
access to digital material...
Digital Detectives
• Digital preservation often a process
of investigation & deduction
• Resource intensive
– Time
– Physi...
Acquire data
Acquisition depends upon object
to be preserved & how stored
• Media: Floppy disk, CD/DVD, ZIP/Jaz disk,
hard...
Case Study: AHDS History dataset
Deposited by children of noted researcher in
2006 & processed by GK
Documentation:
Accomp...
Check completeness
What does the creator intend to
provide?
• Data
• Documentation
• Research instruments
What have they a...
Case Study: Early English Books Online
Collection of 125,000 early printed books
deposited for preservation:
• XML files, ...
Render data
Decode file format
Reflect tools & software available at
point of creation:
• Information content
• Contextual...
Case Study: Scientific dataset
USB stick of LSHTM dataset containing:
• FCS2.0 - tabular data outlining experiments
to cou...
Understand data
• 17th-18th Century Enlightenment
built on information sharing
• Openness & transparency essential
for aca...
Case Study: Adolphe Appia
Warwick Uni. School of Theatre Studies modelled
performance space of Appia's Festspielhaus at
He...
Final thoughts
1. Analyse your needs & capabilities
– What can you do with existing resources?
– What future investment is...
Upcoming SlideShare
Loading in …5
×

Making Sense of a Digital Collection

466 views

Published on

A case study of challenges encountered when acquiring and curating digital collections. Presented at the Digital Preservation Coalition workshop on April 23rd, 2015.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Making Sense of a Digital Collection

  1. 1. MAKING SENSE OF A COLLECTION This work is licensed under a Creative Commons Attribution 2.0 UK: England & Wales License Gareth Knight London School of Hygiene & Tropical Medicine gareth.knight@lshtm.ac.uk Getting Started in Digital Preservation The Information Technologists, London 23rd April 2015
  2. 2. Case Studies National service that preserved research, teaching and learning resources in arts & humanities between 1996 - 2008 Institutional RDM service that helps LSHTM researchers to curate & preserved research data in public health & tropical medicine
  3. 3. Need for Digital Preservation Data Storage media Computing device Operating System Software application Information + + + + = Deteriorate & change over time Obsolete & replaced over time What does this mean? “Digital information lasts forever – or five years, whichever comes first” Jeff Rothenberg, 1997
  4. 4. Climb the preservation mountain “the series of managed activities necessary to ensure continued access to digital materials for as long as necessary.” Neil Beagrie and Maggie Jones (2008) Beagrie & Jones: http://www.dpconline.org/advice/preservationhandbook/introduction/definitions-and-concepts Caplan: http://journals.ala.org/ltr/article/view/4224/4809 Modified version of Caplan’s Preservation Pyramid Content can be used Content is understandable Content is rendered accurately Bits are stored exactly Its value is recognised & it is acquired Data exists
  5. 5. Digital Detectives • Digital preservation often a process of investigation & deduction • Resource intensive – Time – Physical space – Hardware/software costs • How much effort are you willing to make? What is good enough? https://www.flickr.com/photos/ollieolarte/3028314931
  6. 6. Acquire data Acquisition depends upon object to be preserved & how stored • Media: Floppy disk, CD/DVD, ZIP/Jaz disk, hard disk, solid state devices, etc. • Electronic: Email, cloud services Invest in infrastructure to support preservation process • Computer hardware • Media readers • 3rd party services can provide advice and hardware rental where needed https://www.flickr.com/photos/adactio/13127134455
  7. 7. Case Study: AHDS History dataset Deposited by children of noted researcher in 2006 & processed by GK Documentation: Accompanying notes in researcher’s handwriting described a history DB they were working on in 1988. Challenges: • 5.25" disk drive was available • Disk was failing, but managed to create a complete copy on 5th attempt • Disk analysis revealed text content… The author's short stories, not a dataset! Result: Not accessioned, but children were pleased http://www.old-computers.com/museum/computer.asp?st=1&c=810 History database created on a Shelton Instruments Sig-Net, running CP/M 2.2.operating system in 1988 & saved to 5.25” disk
  8. 8. Check completeness What does the creator intend to provide? • Data • Documentation • Research instruments What have they actually provided? • Some data • Creation software & random files • Personal music collection? • Request a file manifest: – Filename – Description – Format https://www.flickr.com/photos/kyngpao/14455832915
  9. 9. Case Study: Early English Books Online Collection of 125,000 early printed books deposited for preservation: • XML files, scanned TIFFs & PDFs for each page • Well structured & labelled Problems: • Hard disk was failing • XML output from Content Management system - incomplete header & missing schema • 30% of files referenced in XML were missing Solution: • Obtained schema & missing files (but took a long, long time)
  10. 10. Render data Decode file format Reflect tools & software available at point of creation: • Information content • Contextual information (documentation/metadata) Analyse organisation structure Intrinsic relationships important for decoding multi-file objects • Filenames & directory structure Solution • Specialist software may be required to access • Liaise with data creators https://www.flickr.com/photos/hawksanddoves/83818392 How many locks do you have to get through to reach your destination?
  11. 11. Case Study: Scientific dataset USB stick of LSHTM dataset containing: • FCS2.0 - tabular data outlining experiments to count cells, sort them & identify biomarkers • Leica Experiment Collection - .lei library file & associated images with embedded metadata Challenges: • Domain & proprietary formats – FITS (file) provides limited info on .lei – FCS not recognised • Complex relationship in Leica experiment - recorded in filename & internal manifest (partial) Solution • Store files as-is • Obtain text output of FCS files • Analyse using open source tools
  12. 12. Understand data • 17th-18th Century Enlightenment built on information sharing • Openness & transparency essential for academic research – Evidence of activity – Open to scrutiny & replication • Can you establish who, what, where, when & how? • How much documentation can only be found in the data creator’s head? https://www.flickr.com/photos/domiriel/5234590796
  13. 13. Case Study: Adolphe Appia Warwick Uni. School of Theatre Studies modelled performance space of Appia's Festspielhaus at Hellerau. Collection deposited on several CDs: • Digitised photographs of 1991 performance • VRML 3D models of performance space • Videos of 3D models in .mov format • Documentation & Metadata Problem • Image metadata ‘disappeared’ on transfer Solution: • Descriptions added to file attributes, which were being removed when written to disc • Output file attributes to text file • Compressed files and copied to disk© King's Visualisation Lab, King's College London http://www.kvl.cch.k cl.ac.uk/appia.html
  14. 14. Final thoughts 1. Analyse your needs & capabilities – What can you do with existing resources? – What future investment is possible? 2. Inform users of your expectations from the outset – File formats – Documentation – File structure & naming conventions – Permissions 3. Help them to fulfil expectations – Advice and guidance http://www.keepcalm-o-matic.co.uk/p/keep-calm-and-curate-41/

×