Approaches to preserving digitized taxonomic data:  Prints, manuscripts  &  specimens Chris Freeland Director, Center for Biodiversity Informatics Technical Director, Biodiversity Heritage Library 28 October 2011 @chrisfreeland
Prints / Manuscripts / Specimens Different objects, similar management http://www.flickr.com/photos/biodivlibrary/6257859557   http://www.flickr.com/photos/chrisfreeland/6018724034   http://www.biodiversitylibrary.org/page/34045915
Overview of Talk Why worry about digital preservation? Considerations for preservation Collaboration File formats Metadata standards Views to the future Preservation Panic!
WHY WORRY? http://www.flickr.com/photos/biodivlibrary/6008902662
Do it once, do it right Costs more to get object to scanner than to scan
Conversion / Compost / Corruption Longevity of digital objects File changes Media obsolescence Cautionary Tales
CONSIDERATION: COLLABORATION
LOCKSS L ots  O f  C opies  K eeps  S tuff  S afe LOCKSS is both a software platform & a concept Software:  http://www.lockss.org
Rule of 3 Museum X Library Y Archive Z 1. Geographic Locations 2. Administrations 3. Technology Platforms
CONSIDERATION: FILE FORMATS
JPEG2000 Wavelet compression, lossless encoding 12 Parts Of particular interest to documents & specimens: Part 1: Core Coding System, ISO/IEC 15444-1 Part 6: Compound image file format Part 10: JP3D, Volumetric images http://www.jpeg.org/jpeg2000/
http://www.tropicos.org/ImageFullView.aspx?imageid=62182
JPEG2000  (Hurrahs & Hisses) Advantages Store a single file for access & preservation Standards-based Saves drive space (important at museum scale) Disadvantages Doesn’t have wide native support in many apps Requires an intermediary app to decode & serve But, there’s an open source option:  djatoka  http://djatoka.sourceforge.net   Reports of data loss
PDF/A ISO-standardized version of PDF suitable for long-term preservation Identifies a "profile" for electronic documents that ensures the documents can be reproduced exactly the same way in years to come.* Makes the file self-contained (and therefore larger) Embeds fonts Graphics   http://en.wikipedia.org/wiki/PDF/A
CONSIDERATION: METADATA
The Great Thing About STANDARDS Is That There Are SO MANY To Choose From
Metadata Preservation Descriptive information (metadata) provides content & context for indexing, reuse Can bundle metadata within files EXIF: images, common in digital cameras Adobe XMP: docs, images Should commit metadata  to file system Should not manage just  in DB or other management  system Filesystem <DwC> XML JP2
THE FUTURE
Electronic Publications Happening now, has been for years Should take same care in ensuring heterogeneity & diversity in digital management systems as with printed, bound books Monolithic libraries have failed over time Monolithic electronic archives will, too
http://www.biodiversitylibrary.org/page/22681143   Need a meadow…
… not a monoculture.
There is no silver bullet Make best decision today Stay up with technology changes & best practices <insert library & archive professionals here> Evaluate, experiment, document, lead Move to stable new technologies when necessary
Questions? Chris Freeland Director, Center for Biodiversity Informatics Technical Director, Biodiversity Heritage Library 28 October 2011 Email:  [email_address] Twitter:  @chrisfreeland

Approaches to preserving digitized taxonomic data

  • 1.
    Approaches to preservingdigitized taxonomic data: Prints, manuscripts & specimens Chris Freeland Director, Center for Biodiversity Informatics Technical Director, Biodiversity Heritage Library 28 October 2011 @chrisfreeland
  • 2.
    Prints / Manuscripts/ Specimens Different objects, similar management http://www.flickr.com/photos/biodivlibrary/6257859557 http://www.flickr.com/photos/chrisfreeland/6018724034 http://www.biodiversitylibrary.org/page/34045915
  • 3.
    Overview of TalkWhy worry about digital preservation? Considerations for preservation Collaboration File formats Metadata standards Views to the future Preservation Panic!
  • 4.
  • 5.
    Do it once,do it right Costs more to get object to scanner than to scan
  • 6.
    Conversion / Compost/ Corruption Longevity of digital objects File changes Media obsolescence Cautionary Tales
  • 7.
  • 8.
    LOCKSS L ots O f C opies K eeps S tuff S afe LOCKSS is both a software platform & a concept Software: http://www.lockss.org
  • 9.
    Rule of 3Museum X Library Y Archive Z 1. Geographic Locations 2. Administrations 3. Technology Platforms
  • 10.
  • 11.
    JPEG2000 Wavelet compression,lossless encoding 12 Parts Of particular interest to documents & specimens: Part 1: Core Coding System, ISO/IEC 15444-1 Part 6: Compound image file format Part 10: JP3D, Volumetric images http://www.jpeg.org/jpeg2000/
  • 12.
  • 13.
    JPEG2000 (Hurrahs& Hisses) Advantages Store a single file for access & preservation Standards-based Saves drive space (important at museum scale) Disadvantages Doesn’t have wide native support in many apps Requires an intermediary app to decode & serve But, there’s an open source option: djatoka http://djatoka.sourceforge.net Reports of data loss
  • 14.
    PDF/A ISO-standardized versionof PDF suitable for long-term preservation Identifies a &quot;profile&quot; for electronic documents that ensures the documents can be reproduced exactly the same way in years to come.* Makes the file self-contained (and therefore larger) Embeds fonts Graphics http://en.wikipedia.org/wiki/PDF/A
  • 15.
  • 16.
    The Great ThingAbout STANDARDS Is That There Are SO MANY To Choose From
  • 17.
    Metadata Preservation Descriptiveinformation (metadata) provides content & context for indexing, reuse Can bundle metadata within files EXIF: images, common in digital cameras Adobe XMP: docs, images Should commit metadata to file system Should not manage just in DB or other management system Filesystem <DwC> XML JP2
  • 18.
  • 19.
    Electronic Publications Happeningnow, has been for years Should take same care in ensuring heterogeneity & diversity in digital management systems as with printed, bound books Monolithic libraries have failed over time Monolithic electronic archives will, too
  • 20.
  • 21.
    … not amonoculture.
  • 22.
    There is nosilver bullet Make best decision today Stay up with technology changes & best practices <insert library & archive professionals here> Evaluate, experiment, document, lead Move to stable new technologies when necessary
  • 23.
    Questions? Chris FreelandDirector, Center for Biodiversity Informatics Technical Director, Biodiversity Heritage Library 28 October 2011 Email: [email_address] Twitter: @chrisfreeland

Editor's Notes

  • #3 Subject is almost irrelevant when talking about preservation Way I preserve a scanned image of a specimen is fundamentally the same as how I’d preserve an image of a manuscript But, metadata standards are important to make sure context, descriptive data are properly described.
  • #6 Scanning is people work.
  • #15 Does not specify the management systems or archiving strategy of the file itself. PDF/A is not a total solution. Good format, needs other pieces previously described to be “archival”