Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives<br />Lynda Schmitz Fuhrig<br /...
Smithsonian Institution Archives’ Mission<br /><ul><li> Appraise, acquire, </li></ul>and preserve<br /><ul><li> Offer a ra...
 Create and promote products and services that broaden the understanding of the Smithsonian
 Provide professional archival and conservation expertise</li></ul>Above, a collection storage area for the Smithsonian In...
SI Archives Digital Services Division<br />Curate and preserve   born-digital collections<br />Digitize images, video,   a...
Born-digital records that document the Smithsonian’s history<br /><ul><li> Text
 Images
 Drawings/CAD
 Databases and spreadsheets
 Audio
 Video
 Websites and social media
 Email accounts</li></ul>Many part of mixed collection            of paper and electronic<br />Removable media <br />or se...
SI Archives’ procedures<br /><ul><li>Inspect media
 Virus scan
 Conduct transfer</li></ul>/ingest with checksums<br /><ul><li> Make copy
 Analyze files for </li></ul>formats and issues<br /><ul><li> Convert proprietary </li></ul>files to <br />preservation fo...
Current preservation formats<br />MS Word/WordPerfect				PDF/A or PDF<br />PowerPoint, Excel 				PDF/A or PDF<br />GIF, JP...
Upcoming SlideShare
Loading in...5
×

Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives

2,248

Published on

Smithsonian Institution Archives
Lynda Schmitz Fuhrig
Why Can’t I Read This File?
Born-Digital Challenges at the Smithsonian Institution Archives
MARAC Fall 2011 presentation

Published in: Technology, Education
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,248
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
45
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • Accession 08-149
  • Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives

    1. 1. Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives<br />Lynda Schmitz Fuhrig<br />Mid-Atlantic Regional <br />Archives Conference <br />Fall 2011, Bethlehem, PA<br />
    2. 2.
    3. 3. Smithsonian Institution Archives’ Mission<br /><ul><li> Appraise, acquire, </li></ul>and preserve<br /><ul><li> Offer a range of research and reference services
    4. 4. Create and promote products and services that broaden the understanding of the Smithsonian
    5. 5. Provide professional archival and conservation expertise</li></ul>Above, a collection storage area for the Smithsonian Institution Archives, located on the third floor of Capital Gallery West. Upper left, in 1894 a room on the fourth floor, East Wing of the Smithsonian Institution Building, was converted for use as the Smithsonian Institution Archives. <br />
    6. 6. SI Archives Digital Services Division<br />Curate and preserve born-digital collections<br />Digitize images, video, and audio<br />Research digital preservation issues<br />Promote the archives through web and outreach<br />SIA Accession 11-124<br />
    7. 7. Born-digital records that document the Smithsonian’s history<br /><ul><li> Text
    8. 8. Images
    9. 9. Drawings/CAD
    10. 10. Databases and spreadsheets
    11. 11. Audio
    12. 12. Video
    13. 13. Websites and social media
    14. 14. Email accounts</li></ul>Many part of mixed collection of paper and electronic<br />Removable media <br />or server/ftp transfer<br />SIA Accession 11-281<br />
    15. 15.
    16. 16. SI Archives’ procedures<br /><ul><li>Inspect media
    17. 17. Virus scan
    18. 18. Conduct transfer</li></ul>/ingest with checksums<br /><ul><li> Make copy
    19. 19. Analyze files for </li></ul>formats and issues<br /><ul><li> Convert proprietary </li></ul>files to <br />preservation formats<br />
    20. 20. Current preservation formats<br />MS Word/WordPerfect PDF/A or PDF<br />PowerPoint, Excel PDF/A or PDF<br />GIF, JPG, BMP, etc. TIF<br />Access databases SIARD XML<br />Audio WAV/BWF<br />Websites crawled and captured as WARC <br />Email saved to XML following CERP/EMCAP preservation schema<br />Born-digital video not straight-forward. Different options<br />Digitized video Motion JPG2000 <br />
    21. 21. Tools for processing<br />Open source and proprietary software<br />Jhove, Droid, FITS (FITS is also a format)<br />MediaInfo<br />In-house batch scripts<br />Duke Data Accessioner<br />Evaluating Curator’s Workbench<br />CERP (SIA-Rockefeller Archive Center) parser<br />
    22. 22. Files in disguise<br /><ul><li> No extension – right click to open in Notepad to see coding, especially helpful with WordPerfect
    23. 23. Wrong extension – .doc could be a Word or it could be WordPerfect</li></ul>BMP that is a JPG<br /><ul><li> Complete unknowns that date back 20 years or more</li></ul>Accession 10-052<br />
    24. 24. Older files<br />Gerber <br />PCD (Kodak Photo CD)<br />EXE (Executables)<br />Gerber overlay, by AA7JC, Creative Commons: Attribution-NonCommercial-ShareAlike 2.0 Generic.<br />
    25. 25. DATs (Digital Audio Tapes)<br />Transfer them now, if you can!<br />Machine production ended<br />Tapes susceptible to fungus, other problems<br />DAT recorded in 1990 <br />for the Folk Masters radio program. <br />SIA Accession 06-106<br />
    26. 26. It Says It Is PDF/A<br />Accession 08-149<br />
    27. 27.
    28. 28. But It’s Not PDF/A<br />
    29. 29. Software incompatibility issues<br />
    30. 30. New formats/flavors/technologies<br />Geospatial PDF <br />WWF – PDF that doesn’t print<br />Keep an eye <br />on mobile sites/apps<br />3D scanning and printing <br />- Point clouds<br />
    31. 31. Digital forensics<br />
    32. 32. Resources for formats<br />Sustainability of Digital Formats – Library of Congress<br />http://www.digitalpreservation.gov/formats<br />Pronom – The National Archives in the UK<br />http://www.nationalarchives.gov.uk/PRONOM/Default.aspx<br />Unified Digital Formats Registry – Expected date of operation 2012<br />http://www.udfr.org/<br />FILExt – File Extension Source<br />http://filext.com/<br />TrID – File Identifier<br />http://mark0.net/soft-trid-e.html<br />
    33. 33. Lynda Schmitz Fuhrig<br />Digital Services Division<br />schmitzfuhrigl@si.edu<br />Smithsonian Institution Archives website:<br />http://siarchives.si.edu<br />
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×