Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Web Archiving: An Overview

879 views

Published on

An introduction to web archiving by Karl-Rainer Blumenthal, Web Archivist for the Internet Archive, and Sumitra Duncan, Web Archiving Coordinator for the New York Art Resources Consortium (NYARC). The presentation covers core concepts, technologies, and issues, and includes a case study of NYARC's web archiving program. It was delivered to an online meeting organized by the Metropolitan New York Library Council (METRO) on January 7, 2016.

Published in: Technology
  • Be the first to comment

Web Archiving: An Overview

  1. 1. Web Archiving: An Overview Karl-Rainer Blumenthal, Internet Archive Sumitra Duncan, Frick Art Reference Library Metropolitan New York Library Council January 7, 2015
  2. 2. What is web archiving? Web archiving is the process of collecting, preserving, and enabling access to web-native materials.
  3. 3. Why archive the web? > Collect web-native resources in your traditional collecting scope. > Fulfill a records retention requirement. > Document spontaneous/online events. > Combat link rot and content drift (no more 404s!).
  4. 4. How does it work? > Web crawlers navigate live websites and download their source code to Web ARChive (WARC) files.
  5. 5. How does it work? > Replay technologies render the archived websites as they appeared at the time they were crawled.
  6. 6. Web archiving tools and services The Wayback Machine https://archive.org/web/ The largest publicly available web archive in existence. > 450+ Billion URLs > 100+ million websites > 40+ languages > ~ 1 billion URLs added per week
  7. 7. Web archiving tools and services The Wayback Machine https://archive.org/web/ The largest publicly available web archive in existence. > 450+ Billion URLs > 100+ million websites > 40+ languages > ~ 1 billion URLs added per week
  8. 8. Web archiving tools and services Heritrix HTTrack Umbra warcprox Wget ARC WARC Wayback Machine OpenWayback pywb (Python Wayback) Webenact oldweb.today
  9. 9. Web archiving tools and services Heritrix HTTrack Umbra warcprox Wget ARC WARC Wayback Machine OpenWayback pywb (Python Wayback) Webenact oldweb.today Archive-It NetarchiveSuite (DK/FR) PANDAS (AUS) Web Curator (UK/NZ) Webrecorder
  10. 10. Who archives the web? Society of American Archivist Web Archiving Roundtable > 900+ member participants Archive-It > 400+ partner organizations (software service subscribers) National Digital Stewardship Alliance (NDSA) > Surveyed web archivists in in 2011, 2013, 2015...
  11. 11. Who archives the web? Organizations with web archiving programs by type NDSA, Web Archiving in the United States: A 2013 Survey 52% 15%13% 8% 5% 4% 1% 2%
  12. 12. Who archives the web? Use of external service vs. in-house archiving NDSA, Web Archiving in the United States: A 2013 Survey 63% 16% 20%
  13. 13. Who archives the web? Staff dedicated to web archiving program NDSA, Web Archiving in the United States: A 2013 Survey 36% 19% 25% 6% 7% 7%
  14. 14. Participation in a collaborative web archive NDSA, Web Archiving in the United States: A 2013 Survey Who archives the web? 48% 33% 17% 2%
  15. 15. Web archiving issues and trends > Access and discovery > Big data analysis > Appraisal, provenance, and metadata > Spontaneous events and social media > Permissions and privacy policies
  16. 16. Web archiving issues and trends > Access and discovery > Big data analysis > Appraisal, provenance, and metadata > Spontaneous events and social media > Permissions and privacy policies
  17. 17. Web archiving issues and trends > Access and discovery > Big data analysis > Appraisal, provenance, and metadata > Spontaneous events and social media > Permissions and privacy policies
  18. 18. Web archiving issues and trends > Access and discovery > Big data analysis > Appraisal, provenance, and metadata > Spontaneous events and social media > Permissions and privacy policies
  19. 19. Web archiving issues and trends > Access and discovery > Big data analysis > Appraisal, provenance, and metadata > Spontaneous events and social media > Permissions and privacy policies
  20. 20. NYARC
  21. 21. Why web archiving at NYARC? > Drift from print to born-digital > Alignment with traditional collecting strengths & unique holdings > Ephemeral nature of websites & risk of impermanence > Not addressed elsewhere = risk of gap in art historical record > Leverage consortial collaboration = better able to be nimble Willem de Ridder. European Mail-Order Warehouse/Fluxshop inventory with Dorothea Meijer, seated, in the home of the artist, Amsterdam. 1964-65. Gelatin silver print. The Museum of Modern Art, New York.
  22. 22. How NYARC got started > 2010 Auction House Pilot Study with Archive-It > 2012 Planning Study > 2013-2015 Mellon Grant for Web Archive Implementation
  23. 23. Web archiving life cycle at NYARC
  24. 24. Collection development / curation
  25. 25. Collection scope > Art Resources > Artists’ Websites > Auction Catalogs > Catalogues Raisonnes > Institutional Web Presence > NYC Galleries > Restitution of Lost or Looted Art
  26. 26. Collection scope > Art Resources > Artists’ Websites > Auction Catalogs > Catalogues Raisonnes > Institutional Web Presence > NYC Galleries > Restitution of Lost or Looted Art
  27. 27. Collection scope > Art Resources > Artists’ Websites > Auction Catalogs > Catalogues Raisonnes > Institutional Web Presence > NYC Galleries > Restitution of Lost or Looted Art
  28. 28. Collection scope > Art Resources > Artists’ Websites > Auction Catalogs > Catalogues Raisonnes > Institutional Web Presence > NYC Galleries > Restitution of Lost or Looted Art
  29. 29. Collection scope > Art Resources > Artists’ Websites > Auction Catalogs > Catalogues Raisonnes > Institutional Web Presence > NYC Galleries > Restitution of Lost or Looted Art
  30. 30. Collection scope > Art Resources > Artists’ Websites > Auction Catalogs > Catalogues Raisonnes > Institutional Web Presence > NYC Galleries > Restitution of Lost or Looted Art
  31. 31. Collection scope > Art Resources > Artists’ Websites > Auction Catalogs > Catalogues Raisonnes > Institutional Web Presence > NYC Galleries > Restitution of Lost or Looted Art
  32. 32. Collection scope > Art Resources > Artists’ Websites > Auction Catalogs > Catalogues Raisonnes > Institutional Web Presence > NYC Galleries > Restitution of Lost or Looted Art
  33. 33. Collection scope > Art Resources > Artists’ Websites > Auction Catalogs > Catalogues Raisonnes > Institutional Web Presence > NYC Galleries > Restitution of Lost or Looted Art
  34. 34. Curation & Quality assurance
  35. 35. Challenges & Lessons learned > Scale > Rapidly evolving and new technologies > Cost > Infrastructure/tools > Permissions/intellectual property considerations
  36. 36. Goals & Lessons learned > Rich and substantial collections > Permanence and long-term preservation > Scalability and sustainability > Networked collections > Greater collaboration = crucial to work together
  37. 37. Where can/should I get started? NDSA Web Archiving in the United States Surveys http://1.usa.gov/1z1H3jo SAA Web Archiving Roundtable www2.archivists.org/groups/web-archiving-roundtable METRO Web Archiving Special Interest Group libguides.metro.org/webarchiving International Internet Preservation Consortium netpreserve.org
  38. 38. Where can/should I get started? NDSA Web Archiving in the United States Surveys http://1.usa.gov/1z1H3jo SAA Web Archiving Roundtable www2.archivists.org/groups/web-archiving-roundtable METRO Web Archiving Special Interest Group libguides.metro.org/webarchiving International Internet Preservation Consortium netpreserve.org Jill Lepore, “The Cobweb: Can the Internet be Archived?” The New Yorker, 1/26/2015 http://www.newyorker.com/magazine/2015/01/26/cobweb
  39. 39. Thanks! ...and keep in touch! Karl-Rainer Blumenthal Web Archivist, Internet Archive karlb@archive.org @LandLibrarian Sumitra Duncan NYARC Web Archiving Coordinator Frick Art Reference Library duncan@frick.org @artlibrariannyc Image credits: Condé Nast International Internet Preservation Consortium Susan Kare, Museum of Modern Art National Digital Stewardship Alliance Archive-It Society of American Archivists Brian Ejar Simple Icons Creative Stall Iconathon Museum of Modern Art The Frick Collection Brooklyn Museum New York Art Resources Consortium

×