SlideShare a Scribd company logo
1 of 22
WARCreate and WAIL:
WARC, Wayback and Heritrix Made Easy
Mat Kelly, Michael L. Nelson, Michele C. Weigle
Old Dominion University
{mkelly,mln,mweigle}@cs.odu.edu
Web Science and Digital Libraries Research Group
ws-dl.blogspot.com
The Problem
Institutional Tools, Personal Archivists
• ON YOUR MACHINE
– Complex to Operate
– Require Infrastructure
• DELEGATED TO INSTITUTIONS
– $$$
– Lose original perspective
• Locale content tailoring (DC vs. San Francisco)
• Observation Medium (PC web browser vs. crawler)
2July 24, 2013
Arlington, Virginia Digital Preservation 2013
The Normal Solution
Ad Hoc Approaches
• Variable Output
• Deviate from standards (e.g., WARC)
• Swell for Saving A Copy
• Bad Practice for Preservation
3July 24, 2013
Arlington, Virginia Digital Preservation 2013
Archive Facebook
Better Solution
• Adapt institutional tools & mediums
4July 24, 2013
Arlington, Virginia Digital Preservation 2013
MAKING THE TOOLS SUITABLE
5July 24, 2013
Arlington, Virginia Digital Preservation 2013
Web Archiving Integration Layer
(WAIL)
• Packages Wayback, Heritrix and other
preservation tools into a GUI
• Tools are pre-configured to work together
• “One Click User-Instigated Preservation”
6July 24, 2013
Arlington, Virginia Digital Preservation 2013
Working with WAIL (Simple)
7
1. Enter URL
2. Click button
• Come back later
• Hit VIEW ARCHIVE
July 24, 2013
Arlington, Virginia Digital Preservation 2013
Working with WAIL (Custom)
8
• Enter multiple seed
URLs (Heritrix tab)
• Customize Crawl
Parameters
• Observe crawl state
• Get included tool info
• Get meta info on crawls
July 24, 2013
Arlington, Virginia Digital Preservation 2013
And More?
• Other preservation tools packaged
– (e.g., Archive Team’s WARC-Proxy)
• GUI is extensible to facilitate further
integration of other tools
– Currently working to package UKWA’s WARC-
Explorer, UKWA’smonitrix, ODU/LANL’smcurl, a
custom memento proxy, etc.
9July 24, 2013
Arlington, Virginia Digital Preservation 2013
PRESERVING IN
THE ORIGINAL CONTEXT
10July 24, 2013
Arlington, Virginia Digital Preservation 2013
WARCreate
Create WARC files from any webpage
• Preserves what you see instead of what
crawler sees
– Capture pages behind authentication
– Manipulate then preserve
• No more preservation delegation
• Created WARCs compatible with WAIL and
Wayback instance
11July 24, 2013
Arlington, Virginia Digital Preservation 2013
extension
Ad hoc to Generally Applicable
12
Archive Facebook WARCreate
App Type
Browser (Firefox) Browser (Chrome)
Output
Navigable
Webpages
Web ARCive
(WARC) files
Target
Facebook.com Any website
July 24, 2013
Arlington, Virginia Digital Preservation 2013
Working with WARCreate
13
• Browse as usual
• Preserve on a
whim
• WARC output
to your
Downloads folder
July 24, 2013
Arlington, Virginia Digital Preservation 2013
Preserving the Original Context
14
Facebook-Supplied Data Dump
Archive created from
WARCreate in Wayback
July 24, 2013
Arlington, Virginia Digital Preservation 2013
Preserving the Original Context
15
Using Scraping Tools (e.g. wget)
Archive created from
WARCreate in Wayback
July 24, 2013
Arlington, Virginia Digital Preservation 2013
Preserving the Original Context
16
A Crawler Has No Context
Archive created from
WARCreate in Wayback
July 24, 2013
Arlington, Virginia Digital Preservation 2013
Preserving the Original Context
17
IA/HERITRIX OBEY ROBOTS
Archive created from
WARCreate in Wayback
July 24, 2013
Arlington, Virginia Digital Preservation 2013
Preserving Beyond the Surface Web
18July 24, 2013
Arlington, Virginia Digital Preservation 2013
Creating a WARC of Your Twitter Feed
(Behind Authentication)
19July 24, 2013
Arlington, Virginia Digital Preservation 2013
Tools’ History
June 2012WARCreate presented at
Joint Conference on Digital Libraries (JCDL) ’12
* required XAMPP, “local server”
July 2012WARCreate presented at
Digital Preservation 2012
* NDSA/NDIIPP award for Future Steward
February 2013 WARCreate decoupled from XAMPP, WAIL
created, presented at
Personal Digital Archiving 2013
May 2013 NEH grant begins to “Archive What I See Now”,
port of WARCreate to Firefox & Much More
July 2013WARCreate re-finalized, 1.0 released, presented
at Digital Preservation 2013
21July 24, 2013
Arlington, Virginia Digital Preservation 2013
Filling a Need
• Capable tools prevent ad hoc archiving
– Keep it familiar
• WARCreate as Chrome extension
– Or keep it native
• WAIL has respective OS look-and-feel
• Good Archiving practices only begin with
content capture, much to do
22July 24, 2013
Arlington, Virginia Digital Preservation 2013
Available Now!
WARCreate.com
matkelly.com/wail
available for:
available for:
Web Archiving Integration Layer (WAIL)
WARCreate
bit.ly/digpres2013

More Related Content

What's hot

Capture All the URLs: First Steps in Web Archiving
Capture All the URLs: First Steps in Web ArchivingCapture All the URLs: First Steps in Web Archiving
Capture All the URLs: First Steps in Web ArchivingKristen Yarmey
 
Open Culture - How Wiki loves art and data - Romaine
Open Culture - How Wiki loves art and data - RomaineOpen Culture - How Wiki loves art and data - Romaine
Open Culture - How Wiki loves art and data - RomaineOpen Knowledge Belgium
 
Wikipedia and Archives: The Why and How of Using Wikipedia for Archival Access
Wikipedia and Archives: The Why and How of Using Wikipedia for Archival AccessWikipedia and Archives: The Why and How of Using Wikipedia for Archival Access
Wikipedia and Archives: The Why and How of Using Wikipedia for Archival AccessDominic McDevitt-Parks
 
Open Culture - How Wiki loves art and data - Packed
 Open Culture - How Wiki loves art and data - Packed Open Culture - How Wiki loves art and data - Packed
Open Culture - How Wiki loves art and data - PackedOpen Knowledge Belgium
 
Wikipedia & Cultural Heritage Institutions: Opportunities for Partnership
Wikipedia & Cultural Heritage Institutions: Opportunities for PartnershipWikipedia & Cultural Heritage Institutions: Opportunities for Partnership
Wikipedia & Cultural Heritage Institutions: Opportunities for Partnershipdorohoward
 
Archives 2.0 And Web 2.0
Archives 2.0 And Web 2.0Archives 2.0 And Web 2.0
Archives 2.0 And Web 2.0jkreeder
 
Alsc Wiki Overview
Alsc Wiki OverviewAlsc Wiki Overview
Alsc Wiki Overviewedu_and20
 
SAA 2015 Web Archiving Roundtable
SAA 2015 Web Archiving RoundtableSAA 2015 Web Archiving Roundtable
SAA 2015 Web Archiving Roundtablerosalielack
 
OpenGLAM in museums: Linked Open Data and Wikipedia
OpenGLAM in museums: Linked Open Data and WikipediaOpenGLAM in museums: Linked Open Data and Wikipedia
OpenGLAM in museums: Linked Open Data and WikipediaGeorgina Goodlander
 
Visualizing Digital Collections at Archive-It - Jcdl 2012
Visualizing Digital Collections at Archive-It - Jcdl 2012Visualizing Digital Collections at Archive-It - Jcdl 2012
Visualizing Digital Collections at Archive-It - Jcdl 2012Kalpesh Padia
 
Wikis 2009
Wikis 2009Wikis 2009
Wikis 2009is20090
 
Slide show sa cworkshop apr23
Slide show   sa cworkshop apr23Slide show   sa cworkshop apr23
Slide show sa cworkshop apr23Mary Bowman-Kruhm
 
Levels of Service for Digital Libraries
Levels of Service for Digital LibrariesLevels of Service for Digital Libraries
Levels of Service for Digital LibrariesGreg Colati
 
Technology Tools for the Changemaker
Technology Tools for the ChangemakerTechnology Tools for the Changemaker
Technology Tools for the ChangemakerCalvin Webster
 

What's hot (17)

Capture All the URLs: First Steps in Web Archiving
Capture All the URLs: First Steps in Web ArchivingCapture All the URLs: First Steps in Web Archiving
Capture All the URLs: First Steps in Web Archiving
 
Open Culture - How Wiki loves art and data - Romaine
Open Culture - How Wiki loves art and data - RomaineOpen Culture - How Wiki loves art and data - Romaine
Open Culture - How Wiki loves art and data - Romaine
 
Data visualization and school finance
Data visualization and school financeData visualization and school finance
Data visualization and school finance
 
Wikipedia and Archives: The Why and How of Using Wikipedia for Archival Access
Wikipedia and Archives: The Why and How of Using Wikipedia for Archival AccessWikipedia and Archives: The Why and How of Using Wikipedia for Archival Access
Wikipedia and Archives: The Why and How of Using Wikipedia for Archival Access
 
Open Culture - How Wiki loves art and data - Packed
 Open Culture - How Wiki loves art and data - Packed Open Culture - How Wiki loves art and data - Packed
Open Culture - How Wiki loves art and data - Packed
 
Wikipedia & Cultural Heritage Institutions: Opportunities for Partnership
Wikipedia & Cultural Heritage Institutions: Opportunities for PartnershipWikipedia & Cultural Heritage Institutions: Opportunities for Partnership
Wikipedia & Cultural Heritage Institutions: Opportunities for Partnership
 
Archives 2.0 And Web 2.0
Archives 2.0 And Web 2.0Archives 2.0 And Web 2.0
Archives 2.0 And Web 2.0
 
Alsc Wiki Overview
Alsc Wiki OverviewAlsc Wiki Overview
Alsc Wiki Overview
 
SAA 2015 Web Archiving Roundtable
SAA 2015 Web Archiving RoundtableSAA 2015 Web Archiving Roundtable
SAA 2015 Web Archiving Roundtable
 
OpenGLAM in museums: Linked Open Data and Wikipedia
OpenGLAM in museums: Linked Open Data and WikipediaOpenGLAM in museums: Linked Open Data and Wikipedia
OpenGLAM in museums: Linked Open Data and Wikipedia
 
Visualizing Digital Collections at Archive-It - Jcdl 2012
Visualizing Digital Collections at Archive-It - Jcdl 2012Visualizing Digital Collections at Archive-It - Jcdl 2012
Visualizing Digital Collections at Archive-It - Jcdl 2012
 
Wikis 2009
Wikis 2009Wikis 2009
Wikis 2009
 
Slide show sa cworkshop apr23
Slide show   sa cworkshop apr23Slide show   sa cworkshop apr23
Slide show sa cworkshop apr23
 
Wrangling Wikipedia
Wrangling WikipediaWrangling Wikipedia
Wrangling Wikipedia
 
Levels of Service for Digital Libraries
Levels of Service for Digital LibrariesLevels of Service for Digital Libraries
Levels of Service for Digital Libraries
 
Technology Tools for the Changemaker
Technology Tools for the ChangemakerTechnology Tools for the Changemaker
Technology Tools for the Changemaker
 
Wikimedia, MediaWiki & Education in IT: Notes
Wikimedia, MediaWiki & Education in IT: NotesWikimedia, MediaWiki & Education in IT: Notes
Wikimedia, MediaWiki & Education in IT: Notes
 

Viewers also liked

2016 07-kdl-interr-infra
2016 07-kdl-interr-infra2016 07-kdl-interr-infra
2016 07-kdl-interr-infraPaul Spence
 
UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6Andrew Su
 
Imperial College London - journey to open scholarship
Imperial College London - journey to open scholarshipImperial College London - journey to open scholarship
Imperial College London - journey to open scholarshipTorsten Reimer
 
Shifting Scientific Practice - ORCID 2015
Shifting Scientific Practice - ORCID 2015Shifting Scientific Practice - ORCID 2015
Shifting Scientific Practice - ORCID 2015Kaitlin Thaney
 
Scaling Islandora
Scaling IslandoraScaling Islandora
Scaling IslandoraErin Tripp
 
NSW Open Data Challenge: Data Request Service
NSW Open Data Challenge: Data Request ServiceNSW Open Data Challenge: Data Request Service
NSW Open Data Challenge: Data Request ServiceCofluence
 
The Danish Open Access Indicator
The Danish Open Access IndicatorThe Danish Open Access Indicator
The Danish Open Access IndicatorMikael Elbæk
 
ePADD and Access -- Society of American Archivists (SAA) Annual Meeting, 2015
ePADD and Access -- Society of American Archivists (SAA) Annual Meeting, 2015ePADD and Access -- Society of American Archivists (SAA) Annual Meeting, 2015
ePADD and Access -- Society of American Archivists (SAA) Annual Meeting, 2015Josh Schneider
 
Social Media and the Archive. Anthony Browne. BBC Scotland - FIAT/IFTA MMC Se...
Social Media and the Archive. Anthony Browne. BBC Scotland - FIAT/IFTA MMC Se...Social Media and the Archive. Anthony Browne. BBC Scotland - FIAT/IFTA MMC Se...
Social Media and the Archive. Anthony Browne. BBC Scotland - FIAT/IFTA MMC Se...FIAT/IFTA
 
Knowledge Patterns SSSW2016
Knowledge Patterns SSSW2016Knowledge Patterns SSSW2016
Knowledge Patterns SSSW2016Aldo Gangemi
 
Securing the future of OA policies - Rob Johnson
Securing the future of OA policies - Rob JohnsonSecuring the future of OA policies - Rob Johnson
Securing the future of OA policies - Rob JohnsonRob Johnson
 
Pedagogy in Public: Open Education Unbound
Pedagogy in Public: Open Education UnboundPedagogy in Public: Open Education Unbound
Pedagogy in Public: Open Education UnboundRobin DeRosa
 
[3.8] Archiving and Publishing in Practice Event Logs - Joos Buijs [3TU.Datac...
[3.8] Archiving and Publishing in Practice Event Logs - Joos Buijs [3TU.Datac...[3.8] Archiving and Publishing in Practice Event Logs - Joos Buijs [3TU.Datac...
[3.8] Archiving and Publishing in Practice Event Logs - Joos Buijs [3TU.Datac...3TU.Datacentrum
 
Dsp bbc-jem rayfield-semtech2011
Dsp bbc-jem rayfield-semtech2011Dsp bbc-jem rayfield-semtech2011
Dsp bbc-jem rayfield-semtech2011Jem Rayfield
 
RDA Publishing Workflows
RDA Publishing WorkflowsRDA Publishing Workflows
RDA Publishing WorkflowsPeter McQuilton
 
Laura Czerniewicz Open Repositories Conference 2016 Dublin
Laura Czerniewicz Open Repositories Conference 2016 Dublin Laura Czerniewicz Open Repositories Conference 2016 Dublin
Laura Czerniewicz Open Repositories Conference 2016 Dublin Laura Czerniewicz
 
The DiNAR Project: Meaningful Mixed Reality for Heritage - Gareth Beale
The DiNAR Project: Meaningful Mixed Reality for Heritage - Gareth BealeThe DiNAR Project: Meaningful Mixed Reality for Heritage - Gareth Beale
The DiNAR Project: Meaningful Mixed Reality for Heritage - Gareth BealeMuseums Computer Group
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple ArchivesMichael Nelson
 
Annotating Scholarly Works - the W3C Open Annotation Model
Annotating Scholarly Works - the W3C Open Annotation ModelAnnotating Scholarly Works - the W3C Open Annotation Model
Annotating Scholarly Works - the W3C Open Annotation ModelRobert Sanderson
 

Viewers also liked (20)

2016 07-kdl-interr-infra
2016 07-kdl-interr-infra2016 07-kdl-interr-infra
2016 07-kdl-interr-infra
 
UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6
 
FIBO & Schema.org
FIBO & Schema.orgFIBO & Schema.org
FIBO & Schema.org
 
Imperial College London - journey to open scholarship
Imperial College London - journey to open scholarshipImperial College London - journey to open scholarship
Imperial College London - journey to open scholarship
 
Shifting Scientific Practice - ORCID 2015
Shifting Scientific Practice - ORCID 2015Shifting Scientific Practice - ORCID 2015
Shifting Scientific Practice - ORCID 2015
 
Scaling Islandora
Scaling IslandoraScaling Islandora
Scaling Islandora
 
NSW Open Data Challenge: Data Request Service
NSW Open Data Challenge: Data Request ServiceNSW Open Data Challenge: Data Request Service
NSW Open Data Challenge: Data Request Service
 
The Danish Open Access Indicator
The Danish Open Access IndicatorThe Danish Open Access Indicator
The Danish Open Access Indicator
 
ePADD and Access -- Society of American Archivists (SAA) Annual Meeting, 2015
ePADD and Access -- Society of American Archivists (SAA) Annual Meeting, 2015ePADD and Access -- Society of American Archivists (SAA) Annual Meeting, 2015
ePADD and Access -- Society of American Archivists (SAA) Annual Meeting, 2015
 
Social Media and the Archive. Anthony Browne. BBC Scotland - FIAT/IFTA MMC Se...
Social Media and the Archive. Anthony Browne. BBC Scotland - FIAT/IFTA MMC Se...Social Media and the Archive. Anthony Browne. BBC Scotland - FIAT/IFTA MMC Se...
Social Media and the Archive. Anthony Browne. BBC Scotland - FIAT/IFTA MMC Se...
 
Knowledge Patterns SSSW2016
Knowledge Patterns SSSW2016Knowledge Patterns SSSW2016
Knowledge Patterns SSSW2016
 
Securing the future of OA policies - Rob Johnson
Securing the future of OA policies - Rob JohnsonSecuring the future of OA policies - Rob Johnson
Securing the future of OA policies - Rob Johnson
 
Pedagogy in Public: Open Education Unbound
Pedagogy in Public: Open Education UnboundPedagogy in Public: Open Education Unbound
Pedagogy in Public: Open Education Unbound
 
[3.8] Archiving and Publishing in Practice Event Logs - Joos Buijs [3TU.Datac...
[3.8] Archiving and Publishing in Practice Event Logs - Joos Buijs [3TU.Datac...[3.8] Archiving and Publishing in Practice Event Logs - Joos Buijs [3TU.Datac...
[3.8] Archiving and Publishing in Practice Event Logs - Joos Buijs [3TU.Datac...
 
Dsp bbc-jem rayfield-semtech2011
Dsp bbc-jem rayfield-semtech2011Dsp bbc-jem rayfield-semtech2011
Dsp bbc-jem rayfield-semtech2011
 
RDA Publishing Workflows
RDA Publishing WorkflowsRDA Publishing Workflows
RDA Publishing Workflows
 
Laura Czerniewicz Open Repositories Conference 2016 Dublin
Laura Czerniewicz Open Repositories Conference 2016 Dublin Laura Czerniewicz Open Repositories Conference 2016 Dublin
Laura Czerniewicz Open Repositories Conference 2016 Dublin
 
The DiNAR Project: Meaningful Mixed Reality for Heritage - Gareth Beale
The DiNAR Project: Meaningful Mixed Reality for Heritage - Gareth BealeThe DiNAR Project: Meaningful Mixed Reality for Heritage - Gareth Beale
The DiNAR Project: Meaningful Mixed Reality for Heritage - Gareth Beale
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple Archives
 
Annotating Scholarly Works - the W3C Open Annotation Model
Annotating Scholarly Works - the W3C Open Annotation ModelAnnotating Scholarly Works - the W3C Open Annotation Model
Annotating Scholarly Works - the W3C Open Annotation Model
 

Similar to Digital Preservation 2013

Drupal Open Source Everything
Drupal Open Source EverythingDrupal Open Source Everything
Drupal Open Source Everythinglibrarywebchic
 
Digital Infrastructure: Storage and Content Management
Digital Infrastructure: Storage and Content ManagementDigital Infrastructure: Storage and Content Management
Digital Infrastructure: Storage and Content ManagementNoreen Whysel
 
RDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data servicesRDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data servicesASIS&T
 
Capture All the URLS: First Steps in Web Archiving
Capture All the URLS: First Steps in Web ArchivingCapture All the URLS: First Steps in Web Archiving
Capture All the URLS: First Steps in Web ArchivingKristen Yarmey
 
Data Matters for AGU Early Career Conference
Data Matters for AGU Early Career ConferenceData Matters for AGU Early Career Conference
Data Matters for AGU Early Career ConferenceCarly Strasser
 
8 online course to master data science
8 online course to master data science8 online course to master data science
8 online course to master data scienceaks sa
 
Browser-Based Digital Preservation
Browser-Based Digital PreservationBrowser-Based Digital Preservation
Browser-Based Digital PreservationMat Kelly
 
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
3.7.17 DSpace for Data: issues, solutions and challenges Webinar SlidesDuraSpace
 
Linked Open Communism - c4l13
Linked Open Communism - c4l13Linked Open Communism - c4l13
Linked Open Communism - c4l13charper
 
Site story wadl2013
Site story wadl2013Site story wadl2013
Site story wadl2013Martin Klein
 
An introduction to the Wikidata Thesis Toolkit / Helen Williams (London Schoo...
An introduction to the Wikidata Thesis Toolkit / Helen Williams (London Schoo...An introduction to the Wikidata Thesis Toolkit / Helen Williams (London Schoo...
An introduction to the Wikidata Thesis Toolkit / Helen Williams (London Schoo...CILIP MDG
 
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Artefactual Systems - AtoM
 
GENI Engineering Conference -- Ian Foster
GENI Engineering Conference -- Ian FosterGENI Engineering Conference -- Ian Foster
GENI Engineering Conference -- Ian FosterIan Foster
 
How Cyverse.org enables scalable data discoverability and re-use
How Cyverse.org enables scalable data discoverability and re-useHow Cyverse.org enables scalable data discoverability and re-use
How Cyverse.org enables scalable data discoverability and re-useMatthew Vaughn
 
WARCreate - Create Wayback-Consumable WARC Files from Any Webpage
WARCreate - Create Wayback-Consumable WARC Files from Any WebpageWARCreate - Create Wayback-Consumable WARC Files from Any Webpage
WARCreate - Create Wayback-Consumable WARC Files from Any WebpageMat Kelly
 
On the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over TimeOn the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over TimeMichael Nelson
 
Preserving the web
Preserving the webPreserving the web
Preserving the webJeremy Floyd
 
Introduction to Omeka
Introduction to OmekaIntroduction to Omeka
Introduction to OmekaShawn Day
 

Similar to Digital Preservation 2013 (20)

Drupal Open Source Everything
Drupal Open Source EverythingDrupal Open Source Everything
Drupal Open Source Everything
 
Digital Infrastructure: Storage and Content Management
Digital Infrastructure: Storage and Content ManagementDigital Infrastructure: Storage and Content Management
Digital Infrastructure: Storage and Content Management
 
RDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data servicesRDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data services
 
Capture All the URLS: First Steps in Web Archiving
Capture All the URLS: First Steps in Web ArchivingCapture All the URLS: First Steps in Web Archiving
Capture All the URLS: First Steps in Web Archiving
 
Semantic wikis
Semantic wikisSemantic wikis
Semantic wikis
 
Data Matters for AGU Early Career Conference
Data Matters for AGU Early Career ConferenceData Matters for AGU Early Career Conference
Data Matters for AGU Early Career Conference
 
8 online course to master data science
8 online course to master data science8 online course to master data science
8 online course to master data science
 
Browser-Based Digital Preservation
Browser-Based Digital PreservationBrowser-Based Digital Preservation
Browser-Based Digital Preservation
 
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
 
Linked Open Communism - c4l13
Linked Open Communism - c4l13Linked Open Communism - c4l13
Linked Open Communism - c4l13
 
Site story wadl2013
Site story wadl2013Site story wadl2013
Site story wadl2013
 
An introduction to the Wikidata Thesis Toolkit / Helen Williams (London Schoo...
An introduction to the Wikidata Thesis Toolkit / Helen Williams (London Schoo...An introduction to the Wikidata Thesis Toolkit / Helen Williams (London Schoo...
An introduction to the Wikidata Thesis Toolkit / Helen Williams (London Schoo...
 
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
 
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-researchUc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
 
GENI Engineering Conference -- Ian Foster
GENI Engineering Conference -- Ian FosterGENI Engineering Conference -- Ian Foster
GENI Engineering Conference -- Ian Foster
 
How Cyverse.org enables scalable data discoverability and re-use
How Cyverse.org enables scalable data discoverability and re-useHow Cyverse.org enables scalable data discoverability and re-use
How Cyverse.org enables scalable data discoverability and re-use
 
WARCreate - Create Wayback-Consumable WARC Files from Any Webpage
WARCreate - Create Wayback-Consumable WARC Files from Any WebpageWARCreate - Create Wayback-Consumable WARC Files from Any Webpage
WARCreate - Create Wayback-Consumable WARC Files from Any Webpage
 
On the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over TimeOn the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over Time
 
Preserving the web
Preserving the webPreserving the web
Preserving the web
 
Introduction to Omeka
Introduction to OmekaIntroduction to Omeka
Introduction to Omeka
 

More from Mat Kelly

Aggregating Private and Public Web Archives Using the Mementity Framework
Aggregating Private and Public Web Archives Using the Mementity FrameworkAggregating Private and Public Web Archives Using the Mementity Framework
Aggregating Private and Public Web Archives Using the Mementity FrameworkMat Kelly
 
Client-Assisted Memento Aggregation Using the Prefer Header
Client-Assisted Memento Aggregation Using the Prefer HeaderClient-Assisted Memento Aggregation Using the Prefer Header
Client-Assisted Memento Aggregation Using the Prefer HeaderMat Kelly
 
A Framework for Aggregating Public and Private Web Archives
A Framework for Aggregating Public and Private Web ArchivesA Framework for Aggregating Public and Private Web Archives
A Framework for Aggregating Public and Private Web ArchivesMat Kelly
 
Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count Mat Kelly
 
Exploring Aggregation of Personal, Private, and Institutional Web Archives
Exploring Aggregation of Personal, Private, and Institutional Web ArchivesExploring Aggregation of Personal, Private, and Institutional Web Archives
Exploring Aggregation of Personal, Private, and Institutional Web ArchivesMat Kelly
 
JCDL 2015 Doctoral Consortium - A Framework for Aggregating Private and Publi...
JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Publi...JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Publi...
JCDL 2015 Doctoral Consortium - A Framework for Aggregating Private and Publi...Mat Kelly
 
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...Mat Kelly
 
Facilitation of the A Posteriori Replication of Web Published Satellite Imagery
Facilitation of the A Posteriori Replication of Web Published Satellite ImageryFacilitation of the A Posteriori Replication of Web Published Satellite Imagery
Facilitation of the A Posteriori Replication of Web Published Satellite ImageryMat Kelly
 
Mink: Integrating the Live and Archived Web Viewing Experience Using Web Brow...
Mink: Integrating the Live and Archived Web Viewing Experience Using Web Brow...Mink: Integrating the Live and Archived Web Viewing Experience Using Web Brow...
Mink: Integrating the Live and Archived Web Viewing Experience Using Web Brow...Mat Kelly
 
Efficient Thumbnail Generation for Web Archives at Digital Preservation 2014
Efficient Thumbnail Generation for Web Archives at Digital Preservation 2014Efficient Thumbnail Generation for Web Archives at Digital Preservation 2014
Efficient Thumbnail Generation for Web Archives at Digital Preservation 2014Mat Kelly
 
IEEE VIS 2013 Graph-Based Navigation of a Box Office Prediction System
IEEE VIS 2013 Graph-Based Navigation of a Box Office Prediction SystemIEEE VIS 2013 Graph-Based Navigation of a Box Office Prediction System
IEEE VIS 2013 Graph-Based Navigation of a Box Office Prediction SystemMat Kelly
 
Making Enterprise-Level Archive Tools Accessible for Personal Web Archiving
Making Enterprise-Level Archive Tools Accessible for Personal Web ArchivingMaking Enterprise-Level Archive Tools Accessible for Personal Web Archiving
Making Enterprise-Level Archive Tools Accessible for Personal Web ArchivingMat Kelly
 
An Extensible Framework for Creating Personal Web Archives of Content Behind ...
An Extensible Framework for Creating Personal Web Archives of Content Behind ...An Extensible Framework for Creating Personal Web Archives of Content Behind ...
An Extensible Framework for Creating Personal Web Archives of Content Behind ...Mat Kelly
 
The Revolution Will Not Be Archived
The Revolution Will Not Be ArchivedThe Revolution Will Not Be Archived
The Revolution Will Not Be ArchivedMat Kelly
 
NDIIPP/NDSA 2011 - YouTube Link Restoration
NDIIPP/NDSA 2011 - YouTube Link RestorationNDIIPP/NDSA 2011 - YouTube Link Restoration
NDIIPP/NDSA 2011 - YouTube Link RestorationMat Kelly
 
NDIIPP/NDSA 2011 - Archive Facebook
NDIIPP/NDSA 2011 - Archive FacebookNDIIPP/NDSA 2011 - Archive Facebook
NDIIPP/NDSA 2011 - Archive FacebookMat Kelly
 

More from Mat Kelly (16)

Aggregating Private and Public Web Archives Using the Mementity Framework
Aggregating Private and Public Web Archives Using the Mementity FrameworkAggregating Private and Public Web Archives Using the Mementity Framework
Aggregating Private and Public Web Archives Using the Mementity Framework
 
Client-Assisted Memento Aggregation Using the Prefer Header
Client-Assisted Memento Aggregation Using the Prefer HeaderClient-Assisted Memento Aggregation Using the Prefer Header
Client-Assisted Memento Aggregation Using the Prefer Header
 
A Framework for Aggregating Public and Private Web Archives
A Framework for Aggregating Public and Private Web ArchivesA Framework for Aggregating Public and Private Web Archives
A Framework for Aggregating Public and Private Web Archives
 
Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count
 
Exploring Aggregation of Personal, Private, and Institutional Web Archives
Exploring Aggregation of Personal, Private, and Institutional Web ArchivesExploring Aggregation of Personal, Private, and Institutional Web Archives
Exploring Aggregation of Personal, Private, and Institutional Web Archives
 
JCDL 2015 Doctoral Consortium - A Framework for Aggregating Private and Publi...
JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Publi...JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Publi...
JCDL 2015 Doctoral Consortium - A Framework for Aggregating Private and Publi...
 
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...
 
Facilitation of the A Posteriori Replication of Web Published Satellite Imagery
Facilitation of the A Posteriori Replication of Web Published Satellite ImageryFacilitation of the A Posteriori Replication of Web Published Satellite Imagery
Facilitation of the A Posteriori Replication of Web Published Satellite Imagery
 
Mink: Integrating the Live and Archived Web Viewing Experience Using Web Brow...
Mink: Integrating the Live and Archived Web Viewing Experience Using Web Brow...Mink: Integrating the Live and Archived Web Viewing Experience Using Web Brow...
Mink: Integrating the Live and Archived Web Viewing Experience Using Web Brow...
 
Efficient Thumbnail Generation for Web Archives at Digital Preservation 2014
Efficient Thumbnail Generation for Web Archives at Digital Preservation 2014Efficient Thumbnail Generation for Web Archives at Digital Preservation 2014
Efficient Thumbnail Generation for Web Archives at Digital Preservation 2014
 
IEEE VIS 2013 Graph-Based Navigation of a Box Office Prediction System
IEEE VIS 2013 Graph-Based Navigation of a Box Office Prediction SystemIEEE VIS 2013 Graph-Based Navigation of a Box Office Prediction System
IEEE VIS 2013 Graph-Based Navigation of a Box Office Prediction System
 
Making Enterprise-Level Archive Tools Accessible for Personal Web Archiving
Making Enterprise-Level Archive Tools Accessible for Personal Web ArchivingMaking Enterprise-Level Archive Tools Accessible for Personal Web Archiving
Making Enterprise-Level Archive Tools Accessible for Personal Web Archiving
 
An Extensible Framework for Creating Personal Web Archives of Content Behind ...
An Extensible Framework for Creating Personal Web Archives of Content Behind ...An Extensible Framework for Creating Personal Web Archives of Content Behind ...
An Extensible Framework for Creating Personal Web Archives of Content Behind ...
 
The Revolution Will Not Be Archived
The Revolution Will Not Be ArchivedThe Revolution Will Not Be Archived
The Revolution Will Not Be Archived
 
NDIIPP/NDSA 2011 - YouTube Link Restoration
NDIIPP/NDSA 2011 - YouTube Link RestorationNDIIPP/NDSA 2011 - YouTube Link Restoration
NDIIPP/NDSA 2011 - YouTube Link Restoration
 
NDIIPP/NDSA 2011 - Archive Facebook
NDIIPP/NDSA 2011 - Archive FacebookNDIIPP/NDSA 2011 - Archive Facebook
NDIIPP/NDSA 2011 - Archive Facebook
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

Digital Preservation 2013

  • 1. WARCreate and WAIL: WARC, Wayback and Heritrix Made Easy Mat Kelly, Michael L. Nelson, Michele C. Weigle Old Dominion University {mkelly,mln,mweigle}@cs.odu.edu Web Science and Digital Libraries Research Group ws-dl.blogspot.com
  • 2. The Problem Institutional Tools, Personal Archivists • ON YOUR MACHINE – Complex to Operate – Require Infrastructure • DELEGATED TO INSTITUTIONS – $$$ – Lose original perspective • Locale content tailoring (DC vs. San Francisco) • Observation Medium (PC web browser vs. crawler) 2July 24, 2013 Arlington, Virginia Digital Preservation 2013
  • 3. The Normal Solution Ad Hoc Approaches • Variable Output • Deviate from standards (e.g., WARC) • Swell for Saving A Copy • Bad Practice for Preservation 3July 24, 2013 Arlington, Virginia Digital Preservation 2013 Archive Facebook
  • 4. Better Solution • Adapt institutional tools & mediums 4July 24, 2013 Arlington, Virginia Digital Preservation 2013
  • 5. MAKING THE TOOLS SUITABLE 5July 24, 2013 Arlington, Virginia Digital Preservation 2013
  • 6. Web Archiving Integration Layer (WAIL) • Packages Wayback, Heritrix and other preservation tools into a GUI • Tools are pre-configured to work together • “One Click User-Instigated Preservation” 6July 24, 2013 Arlington, Virginia Digital Preservation 2013
  • 7. Working with WAIL (Simple) 7 1. Enter URL 2. Click button • Come back later • Hit VIEW ARCHIVE July 24, 2013 Arlington, Virginia Digital Preservation 2013
  • 8. Working with WAIL (Custom) 8 • Enter multiple seed URLs (Heritrix tab) • Customize Crawl Parameters • Observe crawl state • Get included tool info • Get meta info on crawls July 24, 2013 Arlington, Virginia Digital Preservation 2013
  • 9. And More? • Other preservation tools packaged – (e.g., Archive Team’s WARC-Proxy) • GUI is extensible to facilitate further integration of other tools – Currently working to package UKWA’s WARC- Explorer, UKWA’smonitrix, ODU/LANL’smcurl, a custom memento proxy, etc. 9July 24, 2013 Arlington, Virginia Digital Preservation 2013
  • 10. PRESERVING IN THE ORIGINAL CONTEXT 10July 24, 2013 Arlington, Virginia Digital Preservation 2013
  • 11. WARCreate Create WARC files from any webpage • Preserves what you see instead of what crawler sees – Capture pages behind authentication – Manipulate then preserve • No more preservation delegation • Created WARCs compatible with WAIL and Wayback instance 11July 24, 2013 Arlington, Virginia Digital Preservation 2013 extension
  • 12. Ad hoc to Generally Applicable 12 Archive Facebook WARCreate App Type Browser (Firefox) Browser (Chrome) Output Navigable Webpages Web ARCive (WARC) files Target Facebook.com Any website July 24, 2013 Arlington, Virginia Digital Preservation 2013
  • 13. Working with WARCreate 13 • Browse as usual • Preserve on a whim • WARC output to your Downloads folder July 24, 2013 Arlington, Virginia Digital Preservation 2013
  • 14. Preserving the Original Context 14 Facebook-Supplied Data Dump Archive created from WARCreate in Wayback July 24, 2013 Arlington, Virginia Digital Preservation 2013
  • 15. Preserving the Original Context 15 Using Scraping Tools (e.g. wget) Archive created from WARCreate in Wayback July 24, 2013 Arlington, Virginia Digital Preservation 2013
  • 16. Preserving the Original Context 16 A Crawler Has No Context Archive created from WARCreate in Wayback July 24, 2013 Arlington, Virginia Digital Preservation 2013
  • 17. Preserving the Original Context 17 IA/HERITRIX OBEY ROBOTS Archive created from WARCreate in Wayback July 24, 2013 Arlington, Virginia Digital Preservation 2013
  • 18. Preserving Beyond the Surface Web 18July 24, 2013 Arlington, Virginia Digital Preservation 2013
  • 19. Creating a WARC of Your Twitter Feed (Behind Authentication) 19July 24, 2013 Arlington, Virginia Digital Preservation 2013
  • 20. Tools’ History June 2012WARCreate presented at Joint Conference on Digital Libraries (JCDL) ’12 * required XAMPP, “local server” July 2012WARCreate presented at Digital Preservation 2012 * NDSA/NDIIPP award for Future Steward February 2013 WARCreate decoupled from XAMPP, WAIL created, presented at Personal Digital Archiving 2013 May 2013 NEH grant begins to “Archive What I See Now”, port of WARCreate to Firefox & Much More July 2013WARCreate re-finalized, 1.0 released, presented at Digital Preservation 2013 21July 24, 2013 Arlington, Virginia Digital Preservation 2013
  • 21. Filling a Need • Capable tools prevent ad hoc archiving – Keep it familiar • WARCreate as Chrome extension – Or keep it native • WAIL has respective OS look-and-feel • Good Archiving practices only begin with content capture, much to do 22July 24, 2013 Arlington, Virginia Digital Preservation 2013
  • 22. Available Now! WARCreate.com matkelly.com/wail available for: available for: Web Archiving Integration Layer (WAIL) WARCreate bit.ly/digpres2013

Editor's Notes

  1. Introduce ,am here to speak about some of our efforts in building tools for casual archivists hoping to preserve web pages.
  2. First start with identifying problem:Digital preservation tools are ill suited for use by individual digital archivistsTools of focus, Htrix and Wayback, while FOSS, require technical know-how.To remedy, individuals can delegate the task of digpres to institutions but this poses many more problemsOne we have investigates are variances in perspective, as examplified by early crawls of Cragslist, which used GeoIP, and thus attached the saved content to the San Fran CL; Variance in perpective relative to tool used, i.e., what crawler sees may not be the same as what we want preserved
  3. Those that want to preserve resort to ad hoc techniquesThese techniques produced archives that may not stand test of time due to format issues and bad practice procedures used to save pagesEarly work for FB preservation (AFB) tried to remedy this by making the process consistent by saving all pages in one’s FB profile but was limited in scope, frequently broke due to FB redesigns
  4. We saw merit in putting preservation in the hands of those that decide what is important but wanted something:More general purpose – applicable to any webpageUsed standard formats like WARC andTook advantage of the tools that have already been created
  5. To conquer this last goal of adapting institutional tools to amateur archivists, we sought to adapt Heritrix, Wayback and other tools and make them more useable.
  6. Create WAILTook institutional toolsConfigured for relativityCoded up GUI to interact with toolsAllow crawls to be initiated and interacted with via GUIMade it easy: One Click User-Instigated Preservation
  7. Simple working for a one-off crawl:Enter URLHit the Archive Now buttonCheck back later
  8. Allow further capability likeservices managementCustom crawlCrawl status checkingAll still GUI-based