SlideShare a Scribd company logo
1 of 25
Download to read offline
INTRODUCTION web
AN
ARCHIVING
to
JAIME MCCURRY

	
  

NATIONAL DIGITAL STEWARDSHIP RESIDENT
FOLGER SHAKESPEARE LIBRARY : FEBRUARY 19, 2014
ndsr …

nATIONAL DIGITAL STEWARDSHIP
RESIDENCY PROGRAM
1)  BORN-DIGITAL ASSET INVENTORY
2) WEB COLLECTION MANAGEMENT
WEB ARCHIVING
… what is it?
WEB ARCHIVING IS …
the process of
collecting portions of
the World Wide Web,
preserving the
collections in an
archival format, and
then serving the
archives for access
and use. (iipc)
Web archiving is…
Collecting: “web crawlers” harvest content from seed urls
through organized web crawls.

Preserving: crawl results and descriptive metadata are
organized into digital “warc” preservation files

Access and use: captured content is made accessible through
browser tools or portals
Access example
The internet archive’s
wayback machine:
fsl captures, 2011
step by step
Define Collecting Scope
Select Seed URLs
Define Crawl Frequency
Set Crawl Limits
Perform Crawl
Archive and Describe
Provide Access
WEB ARCHIVING
… why?
Why?
Link analysis
Text analysis
Geographic analysis
Trends in technology
Why?
Web
archiving

Digital
“woe,
Destruction,
Ruin,
& decay.”
h#p://www.theguardian.com/world/2013/dec/16/
north-­‐korea-­‐erases-­‐kim-­‐jongun-­‐uncle-­‐archives	
  

Why?

accountability

h#ps://archive-­‐it.org/collec@ons/386	
  
Why?
historic preservation

h#p://www.ny@mes.com/2013/09/24/us/
poli@cs/in-­‐supreme-­‐court-­‐opinions-­‐clicks-­‐that-­‐
lead-­‐nowhere.html	
  
Why?
scholarly preservation
Why?
Cultural preservation

Apple	
  Website: November 2001

Aol: december1996
17 years ago…

h#p://web.archive.org/web/19970220090612/h#p://www.folger.edu/	
  
Library of congress
collection	
  

examples	
  

h#p://www.loc.gov/websites/collec@ons/	
  
Uk national archives
Collection examples	
  

h#p://www.na@onalarchives.gov.uk/webarchive/	
  
folger library	
  
… web archives
Fsl web archives
http://archive-it.org/organizations/576

Mission: to preserve and enhance our collection; to make our
collection accessible to scholars and others who can use it
productively; and to advance understanding and appreciation of
shakespeare’s writings and the culture of the early modern
world.
Fsl web archives
Fsl web archives
be aware…
Fsl in the future?
Shakespeare in the media
Prominent modern actors and actresses
What’s next?
Fsl in the future?
FOLGER SHAKESPEARE LIBRARY =
1/19 “OTHER” INSTITUTIONS (ARCHIVE-IT)
305 TOTAL COLLECTING PARTNERS
= LESS THAN 6%
LESS THAN 13% OF COLLECTING INSTUTIONS
2012 NDSA (LoC) web archiving SURVEY
Contact information
Jaime mccurry
National digital stewardship resident
jmccurry@folger.edu

More Related Content

Similar to "Woe, Destruction, Ruin, and Decay:" An Introduction to Web Archiving

Hub Distributed Model 2009
Hub Distributed Model 2009Hub Distributed Model 2009
Hub Distributed Model 2009
Jane Stevenson
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
Micah Altman
 
"Web Archive services framework for tighter integration between the past and ...
"Web Archive services framework for tighter integration between the past and ..."Web Archive services framework for tighter integration between the past and ...
"Web Archive services framework for tighter integration between the past and ...
Ahmed AlSum
 

Similar to "Woe, Destruction, Ruin, and Decay:" An Introduction to Web Archiving (20)

November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...
November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...
November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...
 
WEB ARCHIVING PROJECTS END-USER PERSPECTIVE
WEB ARCHIVING PROJECTS END-USER PERSPECTIVEWEB ARCHIVING PROJECTS END-USER PERSPECTIVE
WEB ARCHIVING PROJECTS END-USER PERSPECTIVE
 
Open Content / Open Collections
Open Content / Open CollectionsOpen Content / Open Collections
Open Content / Open Collections
 
Digital Archives on a Dime
Digital Archives on a DimeDigital Archives on a Dime
Digital Archives on a Dime
 
Hub Distributed Model 2009
Hub Distributed Model 2009Hub Distributed Model 2009
Hub Distributed Model 2009
 
Collection Description and its Potential, Giuliana De Francesco CIDOC 2011
Collection Description and its Potential, Giuliana De Francesco CIDOC 2011Collection Description and its Potential, Giuliana De Francesco CIDOC 2011
Collection Description and its Potential, Giuliana De Francesco CIDOC 2011
 
Babouk: Focused Web Crawling for Corpus Compilation and Automatic Terminology...
Babouk: Focused Web Crawling for Corpus Compilation and Automatic Terminology...Babouk: Focused Web Crawling for Corpus Compilation and Automatic Terminology...
Babouk: Focused Web Crawling for Corpus Compilation and Automatic Terminology...
 
Web Archiving Intro (circa 2015)
Web Archiving Intro (circa 2015)Web Archiving Intro (circa 2015)
Web Archiving Intro (circa 2015)
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
 
ArcLink - IIPC GA 2013
ArcLink - IIPC GA 2013ArcLink - IIPC GA 2013
ArcLink - IIPC GA 2013
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the web
 
Time -Travel on the Internet
Time -Travel on the InternetTime -Travel on the Internet
Time -Travel on the Internet
 
"Web Archive services framework for tighter integration between the past and ...
"Web Archive services framework for tighter integration between the past and ..."Web Archive services framework for tighter integration between the past and ...
"Web Archive services framework for tighter integration between the past and ...
 
Filling in the Blanks: Capturing Dynamically Generated Content
Filling in the Blanks: Capturing Dynamically Generated ContentFilling in the Blanks: Capturing Dynamically Generated Content
Filling in the Blanks: Capturing Dynamically Generated Content
 
Tool Academy: Web Archiving
Tool Academy: Web ArchivingTool Academy: Web Archiving
Tool Academy: Web Archiving
 
"Article Level" The Future of Resource Discovery
"Article Level" The Future of Resource Discovery"Article Level" The Future of Resource Discovery
"Article Level" The Future of Resource Discovery
 
Sgmp Wiki - GenNxt Wiki Concepts
Sgmp Wiki - GenNxt Wiki ConceptsSgmp Wiki - GenNxt Wiki Concepts
Sgmp Wiki - GenNxt Wiki Concepts
 
The Elephant in the Library - Integrating Hadoop
The Elephant in the Library - Integrating HadoopThe Elephant in the Library - Integrating Hadoop
The Elephant in the Library - Integrating Hadoop
 
Metadata and Scotland’s information environment: potential benefits of Web 2.0
Metadata and Scotland’s information environment: potential benefits of Web 2.0Metadata and Scotland’s information environment: potential benefits of Web 2.0
Metadata and Scotland’s information environment: potential benefits of Web 2.0
 
Web and Twitter Archiving at the Library of Congress
Web and Twitter Archiving at the Library of CongressWeb and Twitter Archiving at the Library of Congress
Web and Twitter Archiving at the Library of Congress
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

"Woe, Destruction, Ruin, and Decay:" An Introduction to Web Archiving