SlideShare a Scribd company logo
WADL 2013
July 25-26th Indianapolis, IN
Martin Klein
@mart1nkle1n
martinklein0815@gmail.com
SiteStory
Archiving Done Differently
http://mementoweb.github.io/SiteStory/
Justin F. Brunelle
jbrunelle@cs.odu.edu
WADL 2013
July 25-26th Indianapolis, IN
LANL SiteStory Teamlead developer
WADL 2013
July 25-26th Indianapolis, IN
Archiving - the traditional way
• Actively crawl the web
• For example, using Heritrix
WADL 2013
July 25-26th Indianapolis, IN
• Issues with crawler based archiving:
• Request can be rejected (robots.txt, user-agent, IP)
• Can be deceived (geo-location, user-agent)
• Can be trapped (crawl my calendar!)
• Requires constant and massive bandwidth
• Implied timing problem, when to crawl?
Archiving - the traditional way
WADL 2013
July 25-26th Indianapolis, IN
Timing problem:
• Update 1 viewed but not archived
t1
R
created
t2
browser
visit1
t3
crawler
visit1
t4
R
update1
t5
browser
visit2
t6
R
update2
Archiving - the traditional way
WADL 2013
July 25-26th Indianapolis, IN
Archiving - the SiteStory way
• Transactional Web archiving
• Archive accepts HTTP transaction between browser
and server
WADL 2013
July 25-26th Indianapolis, IN
Timing problem:
• Update 1 viewed and archived
t1
R
created
t2
browser
visit1
t3
crawler
visit1
t4
R
update1
t5
browser
visit2
t6
R
update2
Archiving - the traditional way
WADL 2013
July 25-26th Indianapolis, IN
WADL 2013
July 25-26th Indianapolis, IN
• Challenges with transactional archiving:
• To be archived server has to cooperate
• Transfer data to archive, batch mode or real-time
• Archive must trust transmission to be authentic
• Resources from external servers have to be archived
out-of-band
• Deduplication challenges
• Alias: different URI, same response
• Conneg: same URI, different response
• Determine “significant” content change
Archiving - the SiteStory way
WADL 2013
July 25-26th Indianapolis, IN
SiteStory Status Quo
• mod_sitestory sends HTTP PUT to SiteStory Web
Archive upon client’s GET request
• not for POST, DELETE, etc
• for HTTP response codes 200, 302, 303
• Client IP can be included in stored headers, configurable
• Header info stored in BerkeleyDB, response body in FS
• Dedup via hash(body)
• Offloading content as WARC files possible
(read: recommended)
WADL 2013
July 25-26th Indianapolis, IN
To Appear: TPDL 2013
• SiteStory benchmark with ab&wget
o ApacheBench (ab): server stress test tool
o wget: Web page download
- All content: -p
• Local network
• Negligible difference between
SiteStory and No SiteStory
WADL 2013
July 25-26th Indianapolis, IN
Re-executed on testbed
ws-dl-03.cs.odu.edu
x99
,…
,
,
megalodon.lanl.gov
@AWS
WADL 2013
July 25-26th Indianapolis, IN
Testing with ab
WADL 2013
July 25-26th Indianapolis, IN
Testing with wget
WADL 2013
July 25-26th Indianapolis, IN
Round Trip Time -- Distributed
WADL 2013
July 25-26th Indianapolis, IN
Results
• Distributed: Higher variance
• Increased delay due to network
• On vs. Off Comparison still comparable
• Viable solution without crippling service
WADL 2013
July 25-26th Indianapolis, IN
SiteStory Installation
• Apache module mod_sitestory
• Option to exclude a list of directories
• SiteStory Web Archive
• Trivial for existing Tomcat environments
• Tanuki Java wrapper (stand-alone) available
• Configure, open ports, go!
Or…
WADL 2013
July 25-26th Indianapolis, IN
SiteStoryTestbed
We have a SiteStory Web Archive installed for you!
1. Install and configure mod_sitestory
2. Send an email containing:
1. Your contact info
2. Web server IP address
3. Server domain name used
3. Happy Sitestory’ing!
mailto: SiteStory-Testbed@googlegroups.com
http://mementoweb.github.io/SiteStory/
WADL 2013
July 25-26th Indianapolis, IN
Martin Klein
@mart1nkle1n
martinklein0815@gmail.com
SiteStory
Archiving Done Differently
http://mementoweb.github.io/SiteStory/
Justin F. Brunelle
jbrunelle@cs.odu.edu

More Related Content

What's hot

Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Bolke de Bruin
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
PyData
 
What is Spark
What is SparkWhat is Spark
What is Spark
Bruno Faria
 
Processing genetic data at scale
Processing genetic data at scaleProcessing genetic data at scale
Processing genetic data at scale
Mark Schroering
 
AWS_Data_Pipeline
AWS_Data_PipelineAWS_Data_Pipeline
AWS_Data_Pipeline
Ahasan Habib
 
Acid ORC, Iceberg and Delta Lake
Acid ORC, Iceberg and Delta LakeAcid ORC, Iceberg and Delta Lake
Acid ORC, Iceberg and Delta Lake
Michal Gancarski
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
clairvoyantllc
 
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow managementIntro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Burasakorn Sabyeying
 
Scaling Graphite At Yelp
Scaling Graphite At YelpScaling Graphite At Yelp
Scaling Graphite At Yelp
Paul O'Connor
 
Airflow for Beginners
Airflow for BeginnersAirflow for Beginners
Airflow for Beginners
Varya Karpenko
 
Lighthouse - an open-source library to build data lakes - Kris Peeters
Lighthouse - an open-source library to build data lakes - Kris PeetersLighthouse - an open-source library to build data lakes - Kris Peeters
Lighthouse - an open-source library to build data lakes - Kris Peeters
Data Science Leuven
 
presto-at-netflix-hadoop-summit-15
presto-at-netflix-hadoop-summit-15presto-at-netflix-hadoop-summit-15
presto-at-netflix-hadoop-summit-15
Zhenxiao Luo
 
Analysing GitHub commits with R
Analysing GitHub commits with RAnalysing GitHub commits with R
Analysing GitHub commits with R
Barbara Fusinska
 
Real Time Big Data
Real Time Big DataReal Time Big Data
Real Time Big Data
InfoFarm
 
Spark: The Good, the Bad, and the Ugly
Spark: The Good, the Bad, and the UglySpark: The Good, the Bad, and the Ugly
Spark: The Good, the Bad, and the Ugly
Sarah Guido
 
Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLSDataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLS
Alasdair Gray
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
Sumit Maheshwari
 
Contributing to Apache Airflow | Journey to becoming Airflow's leading contri...
Contributing to Apache Airflow | Journey to becoming Airflow's leading contri...Contributing to Apache Airflow | Journey to becoming Airflow's leading contri...
Contributing to Apache Airflow | Journey to becoming Airflow's leading contri...
Kaxil Naik
 
Automatic Query-Centric API for Routine Access to Linked Data
Automatic Query-Centric API for Routine Access to Linked DataAutomatic Query-Centric API for Routine Access to Linked Data
Automatic Query-Centric API for Routine Access to Linked Data
Albert Meroño-Peñuela
 
Semantic web and Drupal: an introduction
Semantic web and Drupal: an introductionSemantic web and Drupal: an introduction
Semantic web and Drupal: an introduction
Kristof Van Tomme
 

What's hot (20)

Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
 
What is Spark
What is SparkWhat is Spark
What is Spark
 
Processing genetic data at scale
Processing genetic data at scaleProcessing genetic data at scale
Processing genetic data at scale
 
AWS_Data_Pipeline
AWS_Data_PipelineAWS_Data_Pipeline
AWS_Data_Pipeline
 
Acid ORC, Iceberg and Delta Lake
Acid ORC, Iceberg and Delta LakeAcid ORC, Iceberg and Delta Lake
Acid ORC, Iceberg and Delta Lake
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
 
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow managementIntro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
 
Scaling Graphite At Yelp
Scaling Graphite At YelpScaling Graphite At Yelp
Scaling Graphite At Yelp
 
Airflow for Beginners
Airflow for BeginnersAirflow for Beginners
Airflow for Beginners
 
Lighthouse - an open-source library to build data lakes - Kris Peeters
Lighthouse - an open-source library to build data lakes - Kris PeetersLighthouse - an open-source library to build data lakes - Kris Peeters
Lighthouse - an open-source library to build data lakes - Kris Peeters
 
presto-at-netflix-hadoop-summit-15
presto-at-netflix-hadoop-summit-15presto-at-netflix-hadoop-summit-15
presto-at-netflix-hadoop-summit-15
 
Analysing GitHub commits with R
Analysing GitHub commits with RAnalysing GitHub commits with R
Analysing GitHub commits with R
 
Real Time Big Data
Real Time Big DataReal Time Big Data
Real Time Big Data
 
Spark: The Good, the Bad, and the Ugly
Spark: The Good, the Bad, and the UglySpark: The Good, the Bad, and the Ugly
Spark: The Good, the Bad, and the Ugly
 
Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLSDataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLS
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
Contributing to Apache Airflow | Journey to becoming Airflow's leading contri...
Contributing to Apache Airflow | Journey to becoming Airflow's leading contri...Contributing to Apache Airflow | Journey to becoming Airflow's leading contri...
Contributing to Apache Airflow | Journey to becoming Airflow's leading contri...
 
Automatic Query-Centric API for Routine Access to Linked Data
Automatic Query-Centric API for Routine Access to Linked DataAutomatic Query-Centric API for Routine Access to Linked Data
Automatic Query-Centric API for Routine Access to Linked Data
 
Semantic web and Drupal: an introduction
Semantic web and Drupal: an introductionSemantic web and Drupal: an introduction
Semantic web and Drupal: an introduction
 

Viewers also liked

Who and What Links to the Internet Archive
Who and What Links to the Internet ArchiveWho and What Links to the Internet Archive
Who and What Links to the Internet Archive
Yasmin AlNoamany, PhD
 
Archiving the Mobile Web
Archiving the Mobile WebArchiving the Mobile Web
Archiving the Mobile Web
Frank McCown
 
Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member
Michael Nelson
 
Generating stories from Archive-It collections
Generating stories from Archive-It collectionsGenerating stories from Archive-It collections
Generating stories from Archive-It collections
Yasmin AlNoamany, PhD
 
Access Patterns for Robots and Humans in Web Archives
Access Patterns for Robots and Humans in Web ArchivesAccess Patterns for Robots and Humans in Web Archives
Access Patterns for Robots and Humans in Web Archives
Yasmin AlNoamany, PhD
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich  the Live Web Experience Through StorytellingUsing Web Archives to Enrich  the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Yasmin AlNoamany, PhD
 

Viewers also liked (6)

Who and What Links to the Internet Archive
Who and What Links to the Internet ArchiveWho and What Links to the Internet Archive
Who and What Links to the Internet Archive
 
Archiving the Mobile Web
Archiving the Mobile WebArchiving the Mobile Web
Archiving the Mobile Web
 
Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member
 
Generating stories from Archive-It collections
Generating stories from Archive-It collectionsGenerating stories from Archive-It collections
Generating stories from Archive-It collections
 
Access Patterns for Robots and Humans in Web Archives
Access Patterns for Robots and Humans in Web ArchivesAccess Patterns for Robots and Humans in Web Archives
Access Patterns for Robots and Humans in Web Archives
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich  the Live Web Experience Through StorytellingUsing Web Archives to Enrich  the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through Storytelling
 

Similar to Site story wadl2013

H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
Lucidworks
 
Jcdl2013 mklein
Jcdl2013 mkleinJcdl2013 mklein
Jcdl2013 mklein
Martin Klein
 
Hadoop: The Unintended Benefits
Hadoop: The Unintended BenefitsHadoop: The Unintended Benefits
Hadoop: The Unintended Benefits
DataWorks Summit
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Lucidworks
 
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Uwe Printz
 
This Ain't Your Parents' Search Engine
This Ain't Your Parents' Search EngineThis Ain't Your Parents' Search Engine
This Ain't Your Parents' Search Engine
Lucidworks
 
Data Migration Using AWS Snowball, Snowball Edge & Snowmobile
Data Migration Using AWS Snowball, Snowball Edge & SnowmobileData Migration Using AWS Snowball, Snowball Edge & Snowmobile
Data Migration Using AWS Snowball, Snowball Edge & Snowmobile
Amazon Web Services
 
Globus Portal Framework (APS Workshop)
Globus Portal Framework (APS Workshop)Globus Portal Framework (APS Workshop)
Globus Portal Framework (APS Workshop)
Globus
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Lucidworks (Archived)
 
This Ain't Your Parent's Search Engine
This Ain't Your Parent's Search EngineThis Ain't Your Parent's Search Engine
This Ain't Your Parent's Search Engine
Grant Ingersoll
 
Introduction to Riak - Joel Jacobson
Introduction to Riak - Joel JacobsonIntroduction to Riak - Joel Jacobson
Introduction to Riak - Joel Jacobson
akqaanoraks
 
Webinar: The Future of SQL
Webinar: The Future of SQLWebinar: The Future of SQL
Webinar: The Future of SQL
Crate.io
 
Dataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin BuzzwordsDataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin Buzzwords
Dataiku
 
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Cloudian
 
[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu
[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu
[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu
DataScienceConferenc1
 
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench ToolEvaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Michael Nelson
 
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsA BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
Enrico Daga
 
Big data for bay area big data developer
Big data for bay area big data developerBig data for bay area big data developer
Big data for bay area big data developer
19scottmiller
 
Oracle Java & Developer Cloud Service: What It Does & Doesn't Do
Oracle Java & Developer Cloud Service: What It Does & Doesn't DoOracle Java & Developer Cloud Service: What It Does & Doesn't Do
Oracle Java & Developer Cloud Service: What It Does & Doesn't Do
Revelation Technologies
 
Talavant Data Lake Analytics
Talavant Data Lake Analytics Talavant Data Lake Analytics
Talavant Data Lake Analytics
Sean Forgatch
 

Similar to Site story wadl2013 (20)

H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
H-Hypermap - Heatmap Analytics at Scale: Presented by David Smiley, D W Smile...
 
Jcdl2013 mklein
Jcdl2013 mkleinJcdl2013 mklein
Jcdl2013 mklein
 
Hadoop: The Unintended Benefits
Hadoop: The Unintended BenefitsHadoop: The Unintended Benefits
Hadoop: The Unintended Benefits
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
 
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
 
This Ain't Your Parents' Search Engine
This Ain't Your Parents' Search EngineThis Ain't Your Parents' Search Engine
This Ain't Your Parents' Search Engine
 
Data Migration Using AWS Snowball, Snowball Edge & Snowmobile
Data Migration Using AWS Snowball, Snowball Edge & SnowmobileData Migration Using AWS Snowball, Snowball Edge & Snowmobile
Data Migration Using AWS Snowball, Snowball Edge & Snowmobile
 
Globus Portal Framework (APS Workshop)
Globus Portal Framework (APS Workshop)Globus Portal Framework (APS Workshop)
Globus Portal Framework (APS Workshop)
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
 
This Ain't Your Parent's Search Engine
This Ain't Your Parent's Search EngineThis Ain't Your Parent's Search Engine
This Ain't Your Parent's Search Engine
 
Introduction to Riak - Joel Jacobson
Introduction to Riak - Joel JacobsonIntroduction to Riak - Joel Jacobson
Introduction to Riak - Joel Jacobson
 
Webinar: The Future of SQL
Webinar: The Future of SQLWebinar: The Future of SQL
Webinar: The Future of SQL
 
Dataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin BuzzwordsDataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin Buzzwords
 
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
 
[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu
[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu
[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu
 
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench ToolEvaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
 
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsA BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
 
Big data for bay area big data developer
Big data for bay area big data developerBig data for bay area big data developer
Big data for bay area big data developer
 
Oracle Java & Developer Cloud Service: What It Does & Doesn't Do
Oracle Java & Developer Cloud Service: What It Does & Doesn't DoOracle Java & Developer Cloud Service: What It Does & Doesn't Do
Oracle Java & Developer Cloud Service: What It Does & Doesn't Do
 
Talavant Data Lake Analytics
Talavant Data Lake Analytics Talavant Data Lake Analytics
Talavant Data Lake Analytics
 

More from Martin Klein

On the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly WebOn the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly Web
Martin Klein
 
On the Persistence of Persistent Identifiers of the Scholarly Web
 On the Persistence of Persistent Identifiers of the Scholarly Web On the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly Web
Martin Klein
 
An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly Orphans
Martin Klein
 
Who is Asking - Humans and Machines Experience a Different Scholarly Web
Who is Asking - Humans and Machines  Experience a Different Scholarly WebWho is Asking - Humans and Machines  Experience a Different Scholarly Web
Who is Asking - Humans and Machines Experience a Different Scholarly Web
Martin Klein
 
The Memento Tracer Framework: Balancing Quality and Scalability for Web Arch...
The Memento Tracer Framework: Balancing Quality and Scalability  for Web Arch...The Memento Tracer Framework: Balancing Quality and Scalability  for Web Arch...
The Memento Tracer Framework: Balancing Quality and Scalability for Web Arch...
Martin Klein
 
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
Martin Klein
 
Comparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSyncComparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSync
Martin Klein
 
Evaluating Memento Service Optimizations
Evaluating Memento Service OptimizationsEvaluating Memento Service Optimizations
Evaluating Memento Service Optimizations
Martin Klein
 
An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly Orphans
Martin Klein
 
A Vision of the Library’s Role in Archiving Scholarly Artifacts
A Vision of the Library’s Role  in Archiving Scholarly ArtifactsA Vision of the Library’s Role  in Archiving Scholarly Artifacts
A Vision of the Library’s Role in Archiving Scholarly Artifacts
Martin Klein
 
First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...
Martin Klein
 
Smart Routing of Memento Requests
Smart Routing of Memento RequestsSmart Routing of Memento Requests
Smart Routing of Memento Requests
Martin Klein
 
Building Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web ArchivesBuilding Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web Archives
Martin Klein
 
A Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly ArtifactsA Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly Artifacts
Martin Klein
 
Focused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event CollectionsFocused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event Collections
Martin Klein
 
Creating Topical Collections: Web Archives vs. Live Web
Creating Topical Collections:Web Archives vs. Live WebCreating Topical Collections:Web Archives vs. Live Web
Creating Topical Collections: Web Archives vs. Live Web
Martin Klein
 
Robust Linking to Web Resources
Robust Linking to Web ResourcesRobust Linking to Web Resources
Robust Linking to Web Resources
Martin Klein
 
Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for Repositories
Martin Klein
 
Discovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDDiscovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCID
Martin Klein
 
Using the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly CommunicationUsing the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly Communication
Martin Klein
 

More from Martin Klein (20)

On the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly WebOn the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly Web
 
On the Persistence of Persistent Identifiers of the Scholarly Web
 On the Persistence of Persistent Identifiers of the Scholarly Web On the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly Web
 
An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly Orphans
 
Who is Asking - Humans and Machines Experience a Different Scholarly Web
Who is Asking - Humans and Machines  Experience a Different Scholarly WebWho is Asking - Humans and Machines  Experience a Different Scholarly Web
Who is Asking - Humans and Machines Experience a Different Scholarly Web
 
The Memento Tracer Framework: Balancing Quality and Scalability for Web Arch...
The Memento Tracer Framework: Balancing Quality and Scalability  for Web Arch...The Memento Tracer Framework: Balancing Quality and Scalability  for Web Arch...
The Memento Tracer Framework: Balancing Quality and Scalability for Web Arch...
 
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
 
Comparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSyncComparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSync
 
Evaluating Memento Service Optimizations
Evaluating Memento Service OptimizationsEvaluating Memento Service Optimizations
Evaluating Memento Service Optimizations
 
An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly Orphans
 
A Vision of the Library’s Role in Archiving Scholarly Artifacts
A Vision of the Library’s Role  in Archiving Scholarly ArtifactsA Vision of the Library’s Role  in Archiving Scholarly Artifacts
A Vision of the Library’s Role in Archiving Scholarly Artifacts
 
First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...
 
Smart Routing of Memento Requests
Smart Routing of Memento RequestsSmart Routing of Memento Requests
Smart Routing of Memento Requests
 
Building Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web ArchivesBuilding Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web Archives
 
A Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly ArtifactsA Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly Artifacts
 
Focused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event CollectionsFocused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event Collections
 
Creating Topical Collections: Web Archives vs. Live Web
Creating Topical Collections:Web Archives vs. Live WebCreating Topical Collections:Web Archives vs. Live Web
Creating Topical Collections: Web Archives vs. Live Web
 
Robust Linking to Web Resources
Robust Linking to Web ResourcesRobust Linking to Web Resources
Robust Linking to Web Resources
 
Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for Repositories
 
Discovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDDiscovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCID
 
Using the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly CommunicationUsing the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly Communication
 

Recently uploaded

Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Precisely
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 

Recently uploaded (20)

Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 

Site story wadl2013

  • 1. WADL 2013 July 25-26th Indianapolis, IN Martin Klein @mart1nkle1n martinklein0815@gmail.com SiteStory Archiving Done Differently http://mementoweb.github.io/SiteStory/ Justin F. Brunelle jbrunelle@cs.odu.edu
  • 2. WADL 2013 July 25-26th Indianapolis, IN LANL SiteStory Teamlead developer
  • 3. WADL 2013 July 25-26th Indianapolis, IN Archiving - the traditional way • Actively crawl the web • For example, using Heritrix
  • 4. WADL 2013 July 25-26th Indianapolis, IN • Issues with crawler based archiving: • Request can be rejected (robots.txt, user-agent, IP) • Can be deceived (geo-location, user-agent) • Can be trapped (crawl my calendar!) • Requires constant and massive bandwidth • Implied timing problem, when to crawl? Archiving - the traditional way
  • 5. WADL 2013 July 25-26th Indianapolis, IN Timing problem: • Update 1 viewed but not archived t1 R created t2 browser visit1 t3 crawler visit1 t4 R update1 t5 browser visit2 t6 R update2 Archiving - the traditional way
  • 6. WADL 2013 July 25-26th Indianapolis, IN Archiving - the SiteStory way • Transactional Web archiving • Archive accepts HTTP transaction between browser and server
  • 7. WADL 2013 July 25-26th Indianapolis, IN Timing problem: • Update 1 viewed and archived t1 R created t2 browser visit1 t3 crawler visit1 t4 R update1 t5 browser visit2 t6 R update2 Archiving - the traditional way
  • 8. WADL 2013 July 25-26th Indianapolis, IN
  • 9. WADL 2013 July 25-26th Indianapolis, IN • Challenges with transactional archiving: • To be archived server has to cooperate • Transfer data to archive, batch mode or real-time • Archive must trust transmission to be authentic • Resources from external servers have to be archived out-of-band • Deduplication challenges • Alias: different URI, same response • Conneg: same URI, different response • Determine “significant” content change Archiving - the SiteStory way
  • 10. WADL 2013 July 25-26th Indianapolis, IN SiteStory Status Quo • mod_sitestory sends HTTP PUT to SiteStory Web Archive upon client’s GET request • not for POST, DELETE, etc • for HTTP response codes 200, 302, 303 • Client IP can be included in stored headers, configurable • Header info stored in BerkeleyDB, response body in FS • Dedup via hash(body) • Offloading content as WARC files possible (read: recommended)
  • 11. WADL 2013 July 25-26th Indianapolis, IN To Appear: TPDL 2013 • SiteStory benchmark with ab&wget o ApacheBench (ab): server stress test tool o wget: Web page download - All content: -p • Local network • Negligible difference between SiteStory and No SiteStory
  • 12. WADL 2013 July 25-26th Indianapolis, IN Re-executed on testbed ws-dl-03.cs.odu.edu x99 ,… , , megalodon.lanl.gov @AWS
  • 13. WADL 2013 July 25-26th Indianapolis, IN Testing with ab
  • 14. WADL 2013 July 25-26th Indianapolis, IN Testing with wget
  • 15. WADL 2013 July 25-26th Indianapolis, IN Round Trip Time -- Distributed
  • 16. WADL 2013 July 25-26th Indianapolis, IN Results • Distributed: Higher variance • Increased delay due to network • On vs. Off Comparison still comparable • Viable solution without crippling service
  • 17. WADL 2013 July 25-26th Indianapolis, IN SiteStory Installation • Apache module mod_sitestory • Option to exclude a list of directories • SiteStory Web Archive • Trivial for existing Tomcat environments • Tanuki Java wrapper (stand-alone) available • Configure, open ports, go! Or…
  • 18. WADL 2013 July 25-26th Indianapolis, IN SiteStoryTestbed We have a SiteStory Web Archive installed for you! 1. Install and configure mod_sitestory 2. Send an email containing: 1. Your contact info 2. Web server IP address 3. Server domain name used 3. Happy Sitestory’ing! mailto: SiteStory-Testbed@googlegroups.com http://mementoweb.github.io/SiteStory/
  • 19. WADL 2013 July 25-26th Indianapolis, IN Martin Klein @mart1nkle1n martinklein0815@gmail.com SiteStory Archiving Done Differently http://mementoweb.github.io/SiteStory/ Justin F. Brunelle jbrunelle@cs.odu.edu