Preserving the Smithsonian Institution’s Web Presence


Published on

Presentation delivered by Lynda Schmitz Fuhrig, Electronic Archivist, and Jennifer Wright, Archivist, for the Smithsonian Institution Archives, at the Smithsonian Archives Fair on October 14, 2011 in Washington, DC.

Although it first began capturing institutional websites in the late 1990s, the Smithsonian Institution Archives initiated a project in 2009 to capture the explosion of public websites and social media instances maintained by its many museums, research centers, and programs with the Heritrix crawler. This presentation reviews appraisal, accessioning, and capture issues in documenting the Smithsonian’s web presence in the early 21st Century.

Published in: Education, Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Preserving the Smithsonian Institution’s Web Presence

  1. 1. Preserving the Smithsonian Institution’s Web PresenceSmithsonian Lynda Schmitz Fuhrig and Jennifer WrightInstitution Archives Oct. 14, 2011Fair
  2. 2. The Mission of SI Archives Appraise, acquire, and preserve the records of the Institution Offer a range of research and reference services Establish policy and provide expert guidance on record keeping practices Create and promote products and services that broaden understanding of the Smithsonian Provide professional archival and conservation expertise
  3. 3. Smithsonian’s First Home Page,1995
  4. 4. The Smithsonian Today
  5. 5. Website and Social MediaRegistry A “record” is any official recorded information, regardless of medium or characteristics, created, received, and maintained by a Smithsonian museum, office, or employee Websites and social media accounts must be managed as records Registry allows staff from across the Smithsonian to add and update information about all of their websites and social media accounts
  6. 6. Appraising Records All records must be appraised to determine their ultimate disposition Records appraised based on administrative, legal, historical, and research value Records with long-term value are transferred to Archives
  7. 7. Appraising Traditional WebsitesWebsites are public face of Smithsonian Significant historical and research value Constantly changing Crawl annually and before and after major redesigns Work with webmasters to determine if crawls should be more or less frequent
  8. 8. Appraising Social MediaAccountsAll social media accounts are used differently Each account appraised individually based on content Accounts containing significant original content will be fully captured each year Accounts consisting mostly of links to other resources will be captured occasionally to document existence Method and frequency of capture may depend on terms of service and ability to avoid capturing non-Smithsonian content
  9. 9. Past Web Archiving Procedures• Files transferred from the Smithsonian’s IT office• HTTrack web crawler• Scripts used to create XHTML preservation files but very manual and time-consuming
  10. 10. Heritrix• Archival web crawler• Open source• Java• Developed by Internet Archive, National Library of Norway and National and University Library of Iceland
  11. 11. WARCWARC – Web ARChive file format International standard – ISO 28500:2009 Extension of the ARC format in use since 1996 Container format
  12. 12. Crawling in Heritrix
  13. 13. STRI website in 1995SIA Accession 05-032
  14. 14. Viewing a Crawl
  15. 15. More To Do
  16. 16. Social Media Third-party issues Privacy concerns Different tools
  17. 17. Lessons Learned In-house archiving takes time No one-size fits all solution Master site registry requires regular updating
  18. 18. Contacts and ResourcesLynda Schmitz FuhrigDigital Services Divisionschmitzfuhrigl@si.eduJennifer WrightArchives and Information Management Teamwrightjm@si.eduSmithsonian Institution Archives website: