SlideShare a Scribd company logo
1 of 30
Archive What I See Now
Mat Kelly, Michael L. Nelson, Michele C. Weigle
Old Dominion University
{mkelly,mln,mweigle}@cs.odu.edu
Web Science and Digital Libraries Research Group
ws-dl.blogspot.com
What’s the Problem?
•
•
•
•

Web archives capture a lot but not everything
Individuals’ interests may not be captured
Timely capture is important
Capture capability must
be enabled for all

November 12, 2013
Salt Lake City, Utah

2
2013 Archive-It Partner Meeting
Timely Capture Is Important
Use Case: Capturing Breaking Stories

• Calls for seed URIs
are reactionary
• Not quick enough
for rapidly
evolving events

November 12, 2013
Salt Lake City, Utah

3
2013 Archive-It Partner Meeting
Timely Capture Is Important
Use Case: Capturing Breaking Stories

• Intermediate
mementos missed
• The story is
incomplete

November 12, 2013
Salt Lake City, Utah

4
2013 Archive-It Partner Meeting
Timely Capture Is Important
Use Case: Capturing Breaking Stories

November 12, 2013
Salt Lake City, Utah

5
2013 Archive-It Partner Meeting
Timely Capture Is Important
Use Case: Capturing Breaking Stories

November 12, 2013
Salt Lake City, Utah

6
2013 Archive-It Partner Meeting
The Amateur Archivist’s Approach
to Just-In-Time capture
• Users take ad hoc approaches
1. Screenshots of Pages

2. Other sub-optimal
approaches

November 12, 2013
Salt Lake City, Utah

7
2013 Archive-It Partner Meeting
Enabling The Amateur
Web Archivist
• Acknowledge the problem:
– THE TOOLS ARE DIFFICULT!

• Resolve the problem:
– Build more accessible tools (make it EASY)
– Appeal to standards (e.g., WARC)
– Make interoperable

November 12, 2013
Salt Lake City, Utah

28500:2009

8
2013 Archive-It Partner Meeting
The Institutional Dilemma
• Safety of Archives Requires $
• Institutions Require Funding
• Users’ Hard Drives Fail
– No Access to Save-As files
and Screenshots

• Hybrid approach needed
– Leverage institutional safety, formats, and tech
– allow direct user deposits

November 12, 2013
Salt Lake City, Utah

9
2013 Archive-It Partner Meeting
So we built it!
WARCreate – Google Chrome extension
• Create web archives from browser
• Capture personalized content
• Preserve on a whim

1.
2.

Mat Kelly and Michele C., "WARCreate - Create Wayback-Consumable WARC Files from Any Webpage,"
In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2012). Washington, DC, June 2012, pp. 437-438
Mat Kelly, Michele C. Weigle , Michael Nelson. "WARCreate - Create Wayback-Consumable WARC Files from Any Webpage,"
Digital Preservation 2012, Tools Demo Session: Web Archiving; 2012 Jul 25; Washington, DC.

November 12, 2013
Salt Lake City, Utah

10
2013 Archive-It Partner Meeting
WARCreate – How it Works

November 12, 2013
Salt Lake City, Utah

11
2013 Archive-It Partner Meeting
Preserving the Original Context
Use Case: Capturing Facebook
Archive created from
WARCreate in Wayback

Facebook-Supplied Data Dump

Liberated Data Doesn’t Give The Whole Picture
November 12, 2013
Salt Lake City, Utah

12
2013 Archive-It Partner Meeting
Preserving the Original Context
Use Case: Capturing Facebook
Using Scraping Tools (e.g. wget)

Archive created from
WARCreate in Wayback

The Target Controls What is Allowed
November 12, 2013
Salt Lake City, Utah

13
2013 Archive-It Partner Meeting
Preserving the Original Context
Use Case: Capturing Facebook
Archive created from
WARCreate in Wayback

A Crawler Has No Context

No Credentials  No Entry  No Archiving
November 12, 2013
Salt Lake City, Utah

14
2013 Archive-It Partner Meeting
Preserving the Original Context
Use Case: Capturing Facebook
Archive created from
WARCreate in Wayback

IA/HERITRIX OBEY ROBOTS

No Means No, if They Say and you Obey
November 12, 2013
Salt Lake City, Utah

15
2013 Archive-It Partner Meeting
So we built it!
WARCreate – Google Chrome extension
• Create web archives from browser
• Capture personalized content
• Preserve on a whim

November 12, 2013
Salt Lake City, Utah

16
2013 Archive-It Partner Meeting
Users can now create WARCs!
WARCreate – Google Chrome extension
• Create web archives from browser
• Capture personalized content
• Preserve on a whim

Users don’t know

WHAT TO DO
with WARC files
November 12, 2013
Salt Lake City, Utah

17
2013 Archive-It Partner Meeting
So, again, we built it!
Web Archiving Integration Layer (WAIL)
• Heritrix, Wayback, etc. packaged for PC
• GUI front-end allows “One-Click Preservation”
• Provides means to replay WARCs

1.
2.

November 12, 2013
Salt Lake City, Utah

Mat Kelly, Michele C. Weigle, Michael Nelson. "Making Enterprise-Level Archive Tools Accessible for Personal Web Archiving,"
Personal Digital Archiving 2013, Poster Session; 2013 Feb 21; College Park, MD.
Mat Kelly, Michael Nelson and Michele C. Weigle. "WARCreate and WAIL: WARC, Wayback and Heritrix Made Easy,"
Digital Preservation 2013, Workshops and Sessions: Web Archiving; 2013 Jul 24; Alexandria, VA

18

2013 Archive-It Partner Meeting
So, again, we built it!
Web Archiving Integration Layer (WAIL)
• Heritrix, Wayback, etc. packaged for PC
• GUI front-end allows “One-Click Preservation”
• Provides means to replay WARCs

November 12, 2013
Salt Lake City, Utah

19
2013 Archive-It Partner Meeting
The

Archive What I See Now
Project

November 12, 2013
Salt Lake City, Utah

20
2013 Archive-It Partner Meeting
The Archive What I See Now Project:
Three Goals
1. Port
2. Add functionality in:
…
to upload WARCs to:

&
&

3. Implement Sequential Archiving
November 12, 2013
Salt Lake City, Utah

21
2013 Archive-It Partner Meeting
Porting WARCreate to Firefox
• Disjoint extension/add-on APIs
– Little logic can be re-used

• Problems with HTTP header capture in
Chrome are trivial in Firefox
– Chrome = highly asynchronous fetching

• Code to save WARC to PC from browser
reusable in Firefox

November 12, 2013
Salt Lake City, Utah

22
2013 Archive-It Partner Meeting
The Archive What I See Now Project:
Three Goals

✓ In βeta now!

1. Port
2. Add functionality in:
…
to upload WARCs to:

&
&

3. Implement Sequential Archiving
November 12, 2013
Salt Lake City, Utah

23
2013 Archive-It Partner Meeting
The Archive What I See Now Project:
Three Goals
1. Port
2. Add functionality in:
…
to upload WARCs to:

&
&

3. Implement Sequential Archiving
November 12, 2013
Salt Lake City, Utah

24
2013 Archive-It Partner Meeting
Uploading WARCs:
An Open Question
• Working with Archive-It to determine
feasibility of user-provided WARCs
• Consideration of data integrity
• Should data be merged with A-IT crawled
WARCs?
– How do we account for
your www.facebook.com vs. my www.facebook.com

• Privacy?
November 12, 2013
Salt Lake City, Utah

25
2013 Archive-It Partner Meeting
The Archive What I See Now Project:
Three Goals
1. Port
2. Add functionality in:
…
to upload WARCs to:

&
&

3. Implement Sequential Archiving
November 12, 2013
Salt Lake City, Utah

26
2013 Archive-It Partner Meeting
Sequential Archiving?
• Similar to a focused crawl but URIs defined on
per-site basis to be comprehensive
– Akin to

but generalized

• Implemented into
WARCreate
• Utilize per-site specification to
keep tools from breaking★
personal stream

my tweets

news feed

streams

followees’ tweets

multimedia-photos

photos

photos

N/A

multimedia-videos

videos

videos

N/A

photo collection

albums

N/A

N/A

posts

notes

N/A

N/A

friends

November 12, 2013
Salt Lake City, Utah

posts

global stream

Discovery & Scraping:
The Information Retrieval Approach
- versus The Digital Libraries Approach★

wall

friends

circles

following

27
2013 Archive-It Partner Meeting
Online Hierarchy Definition
• Only (and optionally) applied on recognized sites
– scraping as fallback for establishing hierarchy

• Not limited to social media
– CNN.com, MSNBC.com, etc have similar hierarchies

• Lives online, tools allude to and are always
updated
• Standardized spec* prototype is live online
* M. Kelly, An Extensible Framework for Creating Personal Archives of Web Resources Requiring Authentication, Aug 2012
November 12, 2013
Salt Lake City, Utah

28
2013 Archive-It Partner Meeting
Summary
• Firefox WARCreate in Beta
– Chrome WARCreate Users Can Currently
Archive What They See Now with
&

• Sequential Archiving Implemented in Chrome
WARCreate, needs porting
• Next Big Hurdle: Working with Archive-It in
WARC upload logistics

November 12, 2013
Salt Lake City, Utah

29
2013 Archive-It Partner Meeting
Archive What I See Now
• Download Our Archiving Tools!
Web Archiving Integration Layer (WAIL)
http://matkelly.com/WAIL
One-Click Preservation
Heritrix, Wayback and Others On Your PC!

WARCreate for Chrome

http://WARCreate.com
Create WARC files form any web page from your browser

• Share Your Use Cases for Capturing the
Unpreserved and the Unpreservable
• Help Us Improve Our Tools, Give Feedback!
http://bit.ly/wc-wail
November 12, 2013
Salt Lake City, Utah

version in beta
Available Soon!

30
2013 Archive-It Partner Meeting

More Related Content

What's hot

Wiki Technology By It Rocks
Wiki Technology By It RocksWiki Technology By It Rocks
Wiki Technology By It Rocksnaveenv
 
Wiki Technology By IT ROCKS
Wiki Technology By IT ROCKSWiki Technology By IT ROCKS
Wiki Technology By IT ROCKSnaveenv
 
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising Anna Perricci
 
Introduction to Web Archiving
Introduction to Web ArchivingIntroduction to Web Archiving
Introduction to Web ArchivingAnna Perricci
 
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Anna Perricci
 
Kent State Workshop - Using Web 2.0 Principles to Become Librarian 2.0, wikis...
Kent State Workshop - Using Web 2.0 Principles to Become Librarian 2.0, wikis...Kent State Workshop - Using Web 2.0 Principles to Become Librarian 2.0, wikis...
Kent State Workshop - Using Web 2.0 Principles to Become Librarian 2.0, wikis...Brian Gray
 
Visualizing linkeddata aall2012d-ss
Visualizing linkeddata aall2012d-ssVisualizing linkeddata aall2012d-ss
Visualizing linkeddata aall2012d-ssF. Tim Knight
 
Slide show sa cworkshop apr23
Slide show   sa cworkshop apr23Slide show   sa cworkshop apr23
Slide show sa cworkshop apr23Mary Bowman-Kruhm
 
Beyond Research Guides
Beyond Research GuidesBeyond Research Guides
Beyond Research GuidesWiLS
 
Building Together With Collaborative Web Technologies Revised
Building Together With Collaborative Web Technologies RevisedBuilding Together With Collaborative Web Technologies Revised
Building Together With Collaborative Web Technologies RevisedMark-Shane Scale ♞
 
Building Web Archiving Technology, Together
Building Web Archiving Technology, TogetherBuilding Web Archiving Technology, Together
Building Web Archiving Technology, Togethernullhandle
 
Using Web 2.0 Principles to Become Librarian and Educator 2.0 - Wikis
Using Web 2.0 Principles to Become Librarian and Educator 2.0 - WikisUsing Web 2.0 Principles to Become Librarian and Educator 2.0 - Wikis
Using Web 2.0 Principles to Become Librarian and Educator 2.0 - WikisBrian Gray
 
Clicklaw wikibooks for beyond hope 2013
Clicklaw wikibooks for beyond hope 2013Clicklaw wikibooks for beyond hope 2013
Clicklaw wikibooks for beyond hope 2013Nathaniel Russell
 
SCLS Web 2.0 2.0 presentation 2012
SCLS Web 2.0 2.0 presentation 2012SCLS Web 2.0 2.0 presentation 2012
SCLS Web 2.0 2.0 presentation 2012bmmsben
 

What's hot (20)

Wiki Technology By It Rocks
Wiki Technology By It RocksWiki Technology By It Rocks
Wiki Technology By It Rocks
 
Wiki Technology By IT ROCKS
Wiki Technology By IT ROCKSWiki Technology By IT ROCKS
Wiki Technology By IT ROCKS
 
Data visualization and school finance
Data visualization and school financeData visualization and school finance
Data visualization and school finance
 
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
DPC Web Archiving & Preservation Webinar #4: Outreach & Awareness Raising
 
Introduction to Web Archiving
Introduction to Web ArchivingIntroduction to Web Archiving
Introduction to Web Archiving
 
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
 
Kent State Workshop - Using Web 2.0 Principles to Become Librarian 2.0, wikis...
Kent State Workshop - Using Web 2.0 Principles to Become Librarian 2.0, wikis...Kent State Workshop - Using Web 2.0 Principles to Become Librarian 2.0, wikis...
Kent State Workshop - Using Web 2.0 Principles to Become Librarian 2.0, wikis...
 
Visualizing linkeddata aall2012d-ss
Visualizing linkeddata aall2012d-ssVisualizing linkeddata aall2012d-ss
Visualizing linkeddata aall2012d-ss
 
Slide show sa cworkshop apr23
Slide show   sa cworkshop apr23Slide show   sa cworkshop apr23
Slide show sa cworkshop apr23
 
Beyond Research Guides
Beyond Research GuidesBeyond Research Guides
Beyond Research Guides
 
Nassau Library2
Nassau Library2Nassau Library2
Nassau Library2
 
Wrangling Wikipedia
Wrangling WikipediaWrangling Wikipedia
Wrangling Wikipedia
 
Building Together With Collaborative Web Technologies Revised
Building Together With Collaborative Web Technologies RevisedBuilding Together With Collaborative Web Technologies Revised
Building Together With Collaborative Web Technologies Revised
 
Building Web Archiving Technology, Together
Building Web Archiving Technology, TogetherBuilding Web Archiving Technology, Together
Building Web Archiving Technology, Together
 
Using Web 2.0 Principles to Become Librarian and Educator 2.0 - Wikis
Using Web 2.0 Principles to Become Librarian and Educator 2.0 - WikisUsing Web 2.0 Principles to Become Librarian and Educator 2.0 - Wikis
Using Web 2.0 Principles to Become Librarian and Educator 2.0 - Wikis
 
Clicklaw wikibooks for beyond hope 2013
Clicklaw wikibooks for beyond hope 2013Clicklaw wikibooks for beyond hope 2013
Clicklaw wikibooks for beyond hope 2013
 
Wikimedia, MediaWiki & Education in IT: Notes
Wikimedia, MediaWiki & Education in IT: NotesWikimedia, MediaWiki & Education in IT: Notes
Wikimedia, MediaWiki & Education in IT: Notes
 
Cyberlaw presentation
Cyberlaw presentationCyberlaw presentation
Cyberlaw presentation
 
Wikipedia
Wikipedia Wikipedia
Wikipedia
 
SCLS Web 2.0 2.0 presentation 2012
SCLS Web 2.0 2.0 presentation 2012SCLS Web 2.0 2.0 presentation 2012
SCLS Web 2.0 2.0 presentation 2012
 

Similar to Archive What I See Now - Archive-It Partner Meeting 2013 2013

On the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over TimeOn the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over TimeMichael Nelson
 
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...✔ Eric David Benari, PMP
 
Digital Infrastructure: Storage and Content Management
Digital Infrastructure: Storage and Content ManagementDigital Infrastructure: Storage and Content Management
Digital Infrastructure: Storage and Content ManagementNoreen Whysel
 
Academic Makerspaces: Connections & Conversations - presentation at Internet ...
Academic Makerspaces: Connections & Conversations - presentation at Internet ...Academic Makerspaces: Connections & Conversations - presentation at Internet ...
Academic Makerspaces: Connections & Conversations - presentation at Internet ...Patrick "Tod" Colegrove
 
MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
MementoMap: A Web Archive Profiling Framework for Efficient Memento RoutingMementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
MementoMap: A Web Archive Profiling Framework for Efficient Memento RoutingSawood Alam
 
Browsing and Recomposition Policies to Minimize Temporal Error When Utilizing...
Browsing and Recomposition Policies to Minimize Temporal Error When Utilizing...Browsing and Recomposition Policies to Minimize Temporal Error When Utilizing...
Browsing and Recomposition Policies to Minimize Temporal Error When Utilizing...ScottAinsworth
 
Drupal for Project Managers, Part 3: Launching
Drupal for Project Managers, Part 3: LaunchingDrupal for Project Managers, Part 3: Launching
Drupal for Project Managers, Part 3: LaunchingAcquia
 
METRO Conference Presentation Jan 2015
METRO Conference Presentation Jan 2015METRO Conference Presentation Jan 2015
METRO Conference Presentation Jan 2015Victoria Steeves
 
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...The Frick Collection
 
Detecting and Analyzing Subpopulations within Connectivist MOOCs: Initial work
Detecting and Analyzing Subpopulations within Connectivist MOOCs: Initial workDetecting and Analyzing Subpopulations within Connectivist MOOCs: Initial work
Detecting and Analyzing Subpopulations within Connectivist MOOCs: Initial workMartin Hawksey
 
MCN 2013 Museum Digital Asset Management and Aggregation Survey
MCN 2013 Museum Digital Asset Management and Aggregation Survey MCN 2013 Museum Digital Asset Management and Aggregation Survey
MCN 2013 Museum Digital Asset Management and Aggregation Survey scottsayre
 
Social Bookmarking:Del.icio.us Exposé
Social Bookmarking:Del.icio.us ExposéSocial Bookmarking:Del.icio.us Exposé
Social Bookmarking:Del.icio.us ExposéJudy O'Connell
 
An Extensible Framework for Creating Personal Web Archives of Content Behind ...
An Extensible Framework for Creating Personal Web Archives of Content Behind ...An Extensible Framework for Creating Personal Web Archives of Content Behind ...
An Extensible Framework for Creating Personal Web Archives of Content Behind ...Mat Kelly
 
RDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data servicesRDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data servicesASIS&T
 
The Evolution of the UC San Diego Library DAMS
The Evolution of the  UC San Diego Library DAMSThe Evolution of the  UC San Diego Library DAMS
The Evolution of the UC San Diego Library DAMSMatthew Critchlow
 
Metadata - Linked Data
Metadata - Linked DataMetadata - Linked Data
Metadata - Linked DataRichard Wallis
 
Metadata / Linked Data
Metadata / Linked DataMetadata / Linked Data
Metadata / Linked DataRichard Wallis
 
Practical Data Management - ACRL DCIG Webinar
Practical Data Management - ACRL DCIG WebinarPractical Data Management - ACRL DCIG Webinar
Practical Data Management - ACRL DCIG WebinarKristin Briney
 
JCDL 2015 Doctoral Consortium - A Framework for Aggregating Private and Publi...
JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Publi...JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Publi...
JCDL 2015 Doctoral Consortium - A Framework for Aggregating Private and Publi...Mat Kelly
 

Similar to Archive What I See Now - Archive-It Partner Meeting 2013 2013 (20)

On the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over TimeOn the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over Time
 
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
 
Digital Infrastructure: Storage and Content Management
Digital Infrastructure: Storage and Content ManagementDigital Infrastructure: Storage and Content Management
Digital Infrastructure: Storage and Content Management
 
Academic Makerspaces: Connections & Conversations - presentation at Internet ...
Academic Makerspaces: Connections & Conversations - presentation at Internet ...Academic Makerspaces: Connections & Conversations - presentation at Internet ...
Academic Makerspaces: Connections & Conversations - presentation at Internet ...
 
MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
MementoMap: A Web Archive Profiling Framework for Efficient Memento RoutingMementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
 
Browsing and Recomposition Policies to Minimize Temporal Error When Utilizing...
Browsing and Recomposition Policies to Minimize Temporal Error When Utilizing...Browsing and Recomposition Policies to Minimize Temporal Error When Utilizing...
Browsing and Recomposition Policies to Minimize Temporal Error When Utilizing...
 
Drupal for Project Managers, Part 3: Launching
Drupal for Project Managers, Part 3: LaunchingDrupal for Project Managers, Part 3: Launching
Drupal for Project Managers, Part 3: Launching
 
METRO Conference Presentation Jan 2015
METRO Conference Presentation Jan 2015METRO Conference Presentation Jan 2015
METRO Conference Presentation Jan 2015
 
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
 
Detecting and Analyzing Subpopulations within Connectivist MOOCs: Initial work
Detecting and Analyzing Subpopulations within Connectivist MOOCs: Initial workDetecting and Analyzing Subpopulations within Connectivist MOOCs: Initial work
Detecting and Analyzing Subpopulations within Connectivist MOOCs: Initial work
 
MCN 2013 Museum Digital Asset Management and Aggregation Survey
MCN 2013 Museum Digital Asset Management and Aggregation Survey MCN 2013 Museum Digital Asset Management and Aggregation Survey
MCN 2013 Museum Digital Asset Management and Aggregation Survey
 
Social Bookmarking:Del.icio.us Exposé
Social Bookmarking:Del.icio.us ExposéSocial Bookmarking:Del.icio.us Exposé
Social Bookmarking:Del.icio.us Exposé
 
An Extensible Framework for Creating Personal Web Archives of Content Behind ...
An Extensible Framework for Creating Personal Web Archives of Content Behind ...An Extensible Framework for Creating Personal Web Archives of Content Behind ...
An Extensible Framework for Creating Personal Web Archives of Content Behind ...
 
RDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data servicesRDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data services
 
Semantic wikis
Semantic wikisSemantic wikis
Semantic wikis
 
The Evolution of the UC San Diego Library DAMS
The Evolution of the  UC San Diego Library DAMSThe Evolution of the  UC San Diego Library DAMS
The Evolution of the UC San Diego Library DAMS
 
Metadata - Linked Data
Metadata - Linked DataMetadata - Linked Data
Metadata - Linked Data
 
Metadata / Linked Data
Metadata / Linked DataMetadata / Linked Data
Metadata / Linked Data
 
Practical Data Management - ACRL DCIG Webinar
Practical Data Management - ACRL DCIG WebinarPractical Data Management - ACRL DCIG Webinar
Practical Data Management - ACRL DCIG Webinar
 
JCDL 2015 Doctoral Consortium - A Framework for Aggregating Private and Publi...
JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Publi...JCDL 2015 Doctoral Consortium - A Framework for AggregatingPrivate and Publi...
JCDL 2015 Doctoral Consortium - A Framework for Aggregating Private and Publi...
 

More from Mat Kelly

Aggregating Private and Public Web Archives Using the Mementity Framework
Aggregating Private and Public Web Archives Using the Mementity FrameworkAggregating Private and Public Web Archives Using the Mementity Framework
Aggregating Private and Public Web Archives Using the Mementity FrameworkMat Kelly
 
Client-Assisted Memento Aggregation Using the Prefer Header
Client-Assisted Memento Aggregation Using the Prefer HeaderClient-Assisted Memento Aggregation Using the Prefer Header
Client-Assisted Memento Aggregation Using the Prefer HeaderMat Kelly
 
A Framework for Aggregating Public and Private Web Archives
A Framework for Aggregating Public and Private Web ArchivesA Framework for Aggregating Public and Private Web Archives
A Framework for Aggregating Public and Private Web ArchivesMat Kelly
 
Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count Mat Kelly
 
Exploring Aggregation of Personal, Private, and Institutional Web Archives
Exploring Aggregation of Personal, Private, and Institutional Web ArchivesExploring Aggregation of Personal, Private, and Institutional Web Archives
Exploring Aggregation of Personal, Private, and Institutional Web ArchivesMat Kelly
 
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...Mat Kelly
 
Facilitation of the A Posteriori Replication of Web Published Satellite Imagery
Facilitation of the A Posteriori Replication of Web Published Satellite ImageryFacilitation of the A Posteriori Replication of Web Published Satellite Imagery
Facilitation of the A Posteriori Replication of Web Published Satellite ImageryMat Kelly
 
Mink: Integrating the Live and Archived Web Viewing Experience Using Web Brow...
Mink: Integrating the Live and Archived Web Viewing Experience Using Web Brow...Mink: Integrating the Live and Archived Web Viewing Experience Using Web Brow...
Mink: Integrating the Live and Archived Web Viewing Experience Using Web Brow...Mat Kelly
 
Efficient Thumbnail Generation for Web Archives at Digital Preservation 2014
Efficient Thumbnail Generation for Web Archives at Digital Preservation 2014Efficient Thumbnail Generation for Web Archives at Digital Preservation 2014
Efficient Thumbnail Generation for Web Archives at Digital Preservation 2014Mat Kelly
 
Browser-Based Digital Preservation
Browser-Based Digital PreservationBrowser-Based Digital Preservation
Browser-Based Digital PreservationMat Kelly
 
IEEE VIS 2013 Graph-Based Navigation of a Box Office Prediction System
IEEE VIS 2013 Graph-Based Navigation of a Box Office Prediction SystemIEEE VIS 2013 Graph-Based Navigation of a Box Office Prediction System
IEEE VIS 2013 Graph-Based Navigation of a Box Office Prediction SystemMat Kelly
 
Making Enterprise-Level Archive Tools Accessible for Personal Web Archiving
Making Enterprise-Level Archive Tools Accessible for Personal Web ArchivingMaking Enterprise-Level Archive Tools Accessible for Personal Web Archiving
Making Enterprise-Level Archive Tools Accessible for Personal Web ArchivingMat Kelly
 
The Revolution Will Not Be Archived
The Revolution Will Not Be ArchivedThe Revolution Will Not Be Archived
The Revolution Will Not Be ArchivedMat Kelly
 
WARCreate - Create Wayback-Consumable WARC Files from Any Webpage
WARCreate - Create Wayback-Consumable WARC Files from Any WebpageWARCreate - Create Wayback-Consumable WARC Files from Any Webpage
WARCreate - Create Wayback-Consumable WARC Files from Any WebpageMat Kelly
 
NDIIPP/NDSA 2011 - YouTube Link Restoration
NDIIPP/NDSA 2011 - YouTube Link RestorationNDIIPP/NDSA 2011 - YouTube Link Restoration
NDIIPP/NDSA 2011 - YouTube Link RestorationMat Kelly
 
NDIIPP/NDSA 2011 - Archive Facebook
NDIIPP/NDSA 2011 - Archive FacebookNDIIPP/NDSA 2011 - Archive Facebook
NDIIPP/NDSA 2011 - Archive FacebookMat Kelly
 

More from Mat Kelly (17)

Aggregating Private and Public Web Archives Using the Mementity Framework
Aggregating Private and Public Web Archives Using the Mementity FrameworkAggregating Private and Public Web Archives Using the Mementity Framework
Aggregating Private and Public Web Archives Using the Mementity Framework
 
Client-Assisted Memento Aggregation Using the Prefer Header
Client-Assisted Memento Aggregation Using the Prefer HeaderClient-Assisted Memento Aggregation Using the Prefer Header
Client-Assisted Memento Aggregation Using the Prefer Header
 
A Framework for Aggregating Public and Private Web Archives
A Framework for Aggregating Public and Private Web ArchivesA Framework for Aggregating Public and Private Web Archives
A Framework for Aggregating Public and Private Web Archives
 
Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count
 
Exploring Aggregation of Personal, Private, and Institutional Web Archives
Exploring Aggregation of Personal, Private, and Institutional Web ArchivesExploring Aggregation of Personal, Private, and Institutional Web Archives
Exploring Aggregation of Personal, Private, and Institutional Web Archives
 
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...
 
Facilitation of the A Posteriori Replication of Web Published Satellite Imagery
Facilitation of the A Posteriori Replication of Web Published Satellite ImageryFacilitation of the A Posteriori Replication of Web Published Satellite Imagery
Facilitation of the A Posteriori Replication of Web Published Satellite Imagery
 
Slides
SlidesSlides
Slides
 
Mink: Integrating the Live and Archived Web Viewing Experience Using Web Brow...
Mink: Integrating the Live and Archived Web Viewing Experience Using Web Brow...Mink: Integrating the Live and Archived Web Viewing Experience Using Web Brow...
Mink: Integrating the Live and Archived Web Viewing Experience Using Web Brow...
 
Efficient Thumbnail Generation for Web Archives at Digital Preservation 2014
Efficient Thumbnail Generation for Web Archives at Digital Preservation 2014Efficient Thumbnail Generation for Web Archives at Digital Preservation 2014
Efficient Thumbnail Generation for Web Archives at Digital Preservation 2014
 
Browser-Based Digital Preservation
Browser-Based Digital PreservationBrowser-Based Digital Preservation
Browser-Based Digital Preservation
 
IEEE VIS 2013 Graph-Based Navigation of a Box Office Prediction System
IEEE VIS 2013 Graph-Based Navigation of a Box Office Prediction SystemIEEE VIS 2013 Graph-Based Navigation of a Box Office Prediction System
IEEE VIS 2013 Graph-Based Navigation of a Box Office Prediction System
 
Making Enterprise-Level Archive Tools Accessible for Personal Web Archiving
Making Enterprise-Level Archive Tools Accessible for Personal Web ArchivingMaking Enterprise-Level Archive Tools Accessible for Personal Web Archiving
Making Enterprise-Level Archive Tools Accessible for Personal Web Archiving
 
The Revolution Will Not Be Archived
The Revolution Will Not Be ArchivedThe Revolution Will Not Be Archived
The Revolution Will Not Be Archived
 
WARCreate - Create Wayback-Consumable WARC Files from Any Webpage
WARCreate - Create Wayback-Consumable WARC Files from Any WebpageWARCreate - Create Wayback-Consumable WARC Files from Any Webpage
WARCreate - Create Wayback-Consumable WARC Files from Any Webpage
 
NDIIPP/NDSA 2011 - YouTube Link Restoration
NDIIPP/NDSA 2011 - YouTube Link RestorationNDIIPP/NDSA 2011 - YouTube Link Restoration
NDIIPP/NDSA 2011 - YouTube Link Restoration
 
NDIIPP/NDSA 2011 - Archive Facebook
NDIIPP/NDSA 2011 - Archive FacebookNDIIPP/NDSA 2011 - Archive Facebook
NDIIPP/NDSA 2011 - Archive Facebook
 

Recently uploaded

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Recently uploaded (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Archive What I See Now - Archive-It Partner Meeting 2013 2013

  • 1. Archive What I See Now Mat Kelly, Michael L. Nelson, Michele C. Weigle Old Dominion University {mkelly,mln,mweigle}@cs.odu.edu Web Science and Digital Libraries Research Group ws-dl.blogspot.com
  • 2. What’s the Problem? • • • • Web archives capture a lot but not everything Individuals’ interests may not be captured Timely capture is important Capture capability must be enabled for all November 12, 2013 Salt Lake City, Utah 2 2013 Archive-It Partner Meeting
  • 3. Timely Capture Is Important Use Case: Capturing Breaking Stories • Calls for seed URIs are reactionary • Not quick enough for rapidly evolving events November 12, 2013 Salt Lake City, Utah 3 2013 Archive-It Partner Meeting
  • 4. Timely Capture Is Important Use Case: Capturing Breaking Stories • Intermediate mementos missed • The story is incomplete November 12, 2013 Salt Lake City, Utah 4 2013 Archive-It Partner Meeting
  • 5. Timely Capture Is Important Use Case: Capturing Breaking Stories November 12, 2013 Salt Lake City, Utah 5 2013 Archive-It Partner Meeting
  • 6. Timely Capture Is Important Use Case: Capturing Breaking Stories November 12, 2013 Salt Lake City, Utah 6 2013 Archive-It Partner Meeting
  • 7. The Amateur Archivist’s Approach to Just-In-Time capture • Users take ad hoc approaches 1. Screenshots of Pages 2. Other sub-optimal approaches November 12, 2013 Salt Lake City, Utah 7 2013 Archive-It Partner Meeting
  • 8. Enabling The Amateur Web Archivist • Acknowledge the problem: – THE TOOLS ARE DIFFICULT! • Resolve the problem: – Build more accessible tools (make it EASY) – Appeal to standards (e.g., WARC) – Make interoperable November 12, 2013 Salt Lake City, Utah 28500:2009 8 2013 Archive-It Partner Meeting
  • 9. The Institutional Dilemma • Safety of Archives Requires $ • Institutions Require Funding • Users’ Hard Drives Fail – No Access to Save-As files and Screenshots • Hybrid approach needed – Leverage institutional safety, formats, and tech – allow direct user deposits November 12, 2013 Salt Lake City, Utah 9 2013 Archive-It Partner Meeting
  • 10. So we built it! WARCreate – Google Chrome extension • Create web archives from browser • Capture personalized content • Preserve on a whim 1. 2. Mat Kelly and Michele C., "WARCreate - Create Wayback-Consumable WARC Files from Any Webpage," In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2012). Washington, DC, June 2012, pp. 437-438 Mat Kelly, Michele C. Weigle , Michael Nelson. "WARCreate - Create Wayback-Consumable WARC Files from Any Webpage," Digital Preservation 2012, Tools Demo Session: Web Archiving; 2012 Jul 25; Washington, DC. November 12, 2013 Salt Lake City, Utah 10 2013 Archive-It Partner Meeting
  • 11. WARCreate – How it Works November 12, 2013 Salt Lake City, Utah 11 2013 Archive-It Partner Meeting
  • 12. Preserving the Original Context Use Case: Capturing Facebook Archive created from WARCreate in Wayback Facebook-Supplied Data Dump Liberated Data Doesn’t Give The Whole Picture November 12, 2013 Salt Lake City, Utah 12 2013 Archive-It Partner Meeting
  • 13. Preserving the Original Context Use Case: Capturing Facebook Using Scraping Tools (e.g. wget) Archive created from WARCreate in Wayback The Target Controls What is Allowed November 12, 2013 Salt Lake City, Utah 13 2013 Archive-It Partner Meeting
  • 14. Preserving the Original Context Use Case: Capturing Facebook Archive created from WARCreate in Wayback A Crawler Has No Context No Credentials  No Entry  No Archiving November 12, 2013 Salt Lake City, Utah 14 2013 Archive-It Partner Meeting
  • 15. Preserving the Original Context Use Case: Capturing Facebook Archive created from WARCreate in Wayback IA/HERITRIX OBEY ROBOTS No Means No, if They Say and you Obey November 12, 2013 Salt Lake City, Utah 15 2013 Archive-It Partner Meeting
  • 16. So we built it! WARCreate – Google Chrome extension • Create web archives from browser • Capture personalized content • Preserve on a whim November 12, 2013 Salt Lake City, Utah 16 2013 Archive-It Partner Meeting
  • 17. Users can now create WARCs! WARCreate – Google Chrome extension • Create web archives from browser • Capture personalized content • Preserve on a whim Users don’t know WHAT TO DO with WARC files November 12, 2013 Salt Lake City, Utah 17 2013 Archive-It Partner Meeting
  • 18. So, again, we built it! Web Archiving Integration Layer (WAIL) • Heritrix, Wayback, etc. packaged for PC • GUI front-end allows “One-Click Preservation” • Provides means to replay WARCs 1. 2. November 12, 2013 Salt Lake City, Utah Mat Kelly, Michele C. Weigle, Michael Nelson. "Making Enterprise-Level Archive Tools Accessible for Personal Web Archiving," Personal Digital Archiving 2013, Poster Session; 2013 Feb 21; College Park, MD. Mat Kelly, Michael Nelson and Michele C. Weigle. "WARCreate and WAIL: WARC, Wayback and Heritrix Made Easy," Digital Preservation 2013, Workshops and Sessions: Web Archiving; 2013 Jul 24; Alexandria, VA 18 2013 Archive-It Partner Meeting
  • 19. So, again, we built it! Web Archiving Integration Layer (WAIL) • Heritrix, Wayback, etc. packaged for PC • GUI front-end allows “One-Click Preservation” • Provides means to replay WARCs November 12, 2013 Salt Lake City, Utah 19 2013 Archive-It Partner Meeting
  • 20. The Archive What I See Now Project November 12, 2013 Salt Lake City, Utah 20 2013 Archive-It Partner Meeting
  • 21. The Archive What I See Now Project: Three Goals 1. Port 2. Add functionality in: … to upload WARCs to: & & 3. Implement Sequential Archiving November 12, 2013 Salt Lake City, Utah 21 2013 Archive-It Partner Meeting
  • 22. Porting WARCreate to Firefox • Disjoint extension/add-on APIs – Little logic can be re-used • Problems with HTTP header capture in Chrome are trivial in Firefox – Chrome = highly asynchronous fetching • Code to save WARC to PC from browser reusable in Firefox November 12, 2013 Salt Lake City, Utah 22 2013 Archive-It Partner Meeting
  • 23. The Archive What I See Now Project: Three Goals ✓ In βeta now! 1. Port 2. Add functionality in: … to upload WARCs to: & & 3. Implement Sequential Archiving November 12, 2013 Salt Lake City, Utah 23 2013 Archive-It Partner Meeting
  • 24. The Archive What I See Now Project: Three Goals 1. Port 2. Add functionality in: … to upload WARCs to: & & 3. Implement Sequential Archiving November 12, 2013 Salt Lake City, Utah 24 2013 Archive-It Partner Meeting
  • 25. Uploading WARCs: An Open Question • Working with Archive-It to determine feasibility of user-provided WARCs • Consideration of data integrity • Should data be merged with A-IT crawled WARCs? – How do we account for your www.facebook.com vs. my www.facebook.com • Privacy? November 12, 2013 Salt Lake City, Utah 25 2013 Archive-It Partner Meeting
  • 26. The Archive What I See Now Project: Three Goals 1. Port 2. Add functionality in: … to upload WARCs to: & & 3. Implement Sequential Archiving November 12, 2013 Salt Lake City, Utah 26 2013 Archive-It Partner Meeting
  • 27. Sequential Archiving? • Similar to a focused crawl but URIs defined on per-site basis to be comprehensive – Akin to but generalized • Implemented into WARCreate • Utilize per-site specification to keep tools from breaking★ personal stream my tweets news feed streams followees’ tweets multimedia-photos photos photos N/A multimedia-videos videos videos N/A photo collection albums N/A N/A posts notes N/A N/A friends November 12, 2013 Salt Lake City, Utah posts global stream Discovery & Scraping: The Information Retrieval Approach - versus The Digital Libraries Approach★ wall friends circles following 27 2013 Archive-It Partner Meeting
  • 28. Online Hierarchy Definition • Only (and optionally) applied on recognized sites – scraping as fallback for establishing hierarchy • Not limited to social media – CNN.com, MSNBC.com, etc have similar hierarchies • Lives online, tools allude to and are always updated • Standardized spec* prototype is live online * M. Kelly, An Extensible Framework for Creating Personal Archives of Web Resources Requiring Authentication, Aug 2012 November 12, 2013 Salt Lake City, Utah 28 2013 Archive-It Partner Meeting
  • 29. Summary • Firefox WARCreate in Beta – Chrome WARCreate Users Can Currently Archive What They See Now with & • Sequential Archiving Implemented in Chrome WARCreate, needs porting • Next Big Hurdle: Working with Archive-It in WARC upload logistics November 12, 2013 Salt Lake City, Utah 29 2013 Archive-It Partner Meeting
  • 30. Archive What I See Now • Download Our Archiving Tools! Web Archiving Integration Layer (WAIL) http://matkelly.com/WAIL One-Click Preservation Heritrix, Wayback and Others On Your PC! WARCreate for Chrome http://WARCreate.com Create WARC files form any web page from your browser • Share Your Use Cases for Capturing the Unpreserved and the Unpreservable • Help Us Improve Our Tools, Give Feedback! http://bit.ly/wc-wail November 12, 2013 Salt Lake City, Utah version in beta Available Soon! 30 2013 Archive-It Partner Meeting