Your SlideShare is downloading. ×
Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives


Published on

Smithsonian Institution Archives …

Smithsonian Institution Archives
Lynda Schmitz Fuhrig
Why Can’t I Read This File?
Born-Digital Challenges at the Smithsonian Institution Archives
MARAC Fall 2011 presentation

Published in: Technology, Education

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • Accession 08-149
  • Transcript

    • 1. Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives
      Lynda Schmitz Fuhrig
      Mid-Atlantic Regional
      Archives Conference
      Fall 2011, Bethlehem, PA
    • 2.
    • 3. Smithsonian Institution Archives’ Mission
      • Appraise, acquire,
      and preserve
      • Offer a range of research and reference services
      • 4. Create and promote products and services that broaden the understanding of the Smithsonian
      • 5. Provide professional archival and conservation expertise
      Above, a collection storage area for the Smithsonian Institution Archives, located on the third floor of Capital Gallery West. Upper left, in 1894 a room on the fourth floor, East Wing of the Smithsonian Institution Building, was converted for use as the Smithsonian Institution Archives.
    • 6. SI Archives Digital Services Division
      Curate and preserve born-digital collections
      Digitize images, video, and audio
      Research digital preservation issues
      Promote the archives through web and outreach
      SIA Accession 11-124
    • 7. Born-digital records that document the Smithsonian’s history
      Many part of mixed collection of paper and electronic
      Removable media
      or server/ftp transfer
      SIA Accession 11-281
    • 15.
    • 16. SI Archives’ procedures
      /ingest with checksums
      • Make copy
      • 19. Analyze files for
      formats and issues
      • Convert proprietary
      files to
      preservation formats
    • 20. Current preservation formats
      MS Word/WordPerfect PDF/A or PDF
      PowerPoint, Excel PDF/A or PDF
      GIF, JPG, BMP, etc. TIF
      Access databases SIARD XML
      Audio WAV/BWF
      Websites crawled and captured as WARC
      Email saved to XML following CERP/EMCAP preservation schema
      Born-digital video not straight-forward. Different options
      Digitized video Motion JPG2000
    • 21. Tools for processing
      Open source and proprietary software
      Jhove, Droid, FITS (FITS is also a format)
      In-house batch scripts
      Duke Data Accessioner
      Evaluating Curator’s Workbench
      CERP (SIA-Rockefeller Archive Center) parser
    • 22. Files in disguise
      • No extension – right click to open in Notepad to see coding, especially helpful with WordPerfect
      • 23. Wrong extension – .doc could be a Word or it could be WordPerfect
      BMP that is a JPG
      • Complete unknowns that date back 20 years or more
      Accession 10-052
    • 24. Older files
      PCD (Kodak Photo CD)
      EXE (Executables)
      Gerber overlay, by AA7JC, Creative Commons: Attribution-NonCommercial-ShareAlike 2.0 Generic.
    • 25. DATs (Digital Audio Tapes)
      Transfer them now, if you can!
      Machine production ended
      Tapes susceptible to fungus, other problems
      DAT recorded in 1990
      for the Folk Masters radio program.
      SIA Accession 06-106
    • 26. It Says It Is PDF/A
      Accession 08-149
    • 27.
    • 28. But It’s Not PDF/A
    • 29. Software incompatibility issues
    • 30. New formats/flavors/technologies
      Geospatial PDF
      WWF – PDF that doesn’t print
      Keep an eye
      on mobile sites/apps
      3D scanning and printing
      - Point clouds
    • 31. Digital forensics
    • 32. Resources for formats
      Sustainability of Digital Formats – Library of Congress
      Pronom – The National Archives in the UK
      Unified Digital Formats Registry – Expected date of operation 2012
      FILExt – File Extension Source
      TrID – File Identifier
    • 33. Lynda Schmitz Fuhrig
      Digital Services Division
      Smithsonian Institution Archives website: