SlideShare a Scribd company logo
Sawood Alam, Internet Archive
Michael L. Nelson, Old Dominion University
Michele C. Weigle, Old Dominion University
Daniel Gomes, Arquivo.pt
Summarize Your
Archival Holdings
With MementoMap
IIPC Web Archiving Conference, June 16, 2021
#MementoMap
@ibnesayeed
2
@ibnesayeed 3
$ memgator -f cdxj http://si.edu/ | grep -v "^!" | cut -d'/' -f3 | sort | uniq -c | sort -nr
13263 web.archive.org
3590 wayback.archive-it.org
1202 web.archive.bibalex.org
651 webarchive.loc.gov
321 arquivo.pt
32 wayback.vefsafn.is
11 web.archive.org.au
3 archive.is
1 www.webarchive.org.uk
1 swap.stanford.edu
1 perma.cc
$ memgator -f cdxj http://odu.edu/ | grep -v "^!" | cut -d'/' -f3 | sort | uniq -c | sort -nr
3071 web.archive.org
796 wayback.archive-it.org
751 web.archive.bibalex.org
99 webarchive.loc.gov
26 arquivo.pt
2 archive.is
1 wayback.vefsafn.is
Cross-Archive Memento Lookup With MemGator
Although there are
13k+ mementos in IA,
there are also
mementos in 10 other
public web archives.
https://github.com/oduwsdl/MemGator
ODU is less popular, but
there are mementos in 7
different web archives.
@ibnesayeed
Who Would Have Thought to Lookup in the Icelandic
Web Archive for odu.edu Mementos?
4
http://wayback.vefsafn.is/wayback/20100810032449/http://odu.edu/
@ibnesayeed
Prevalence of Sample Query URI Sets in Archives
5
Sample
(1M URIs Each)
In
Archive-It
In
UKWA
In
Stanford
Union
{AIT, UK, SU}
DMOZ 4.097% 3.594% 0.034% 7.575%
MementoProxy 4.182% 0.408% 0.046% 4.527%
IAWayback 3.716% 0.519% 0.039% 4.165%
UKWayback 0.108% 0.034% 0.002% 0.134%
Alam et al., “Web Archive Profiling Through CDX Summarization”, IJDL 2016
@ibnesayeed
Why Aggregate Small Archives?
● Wayback Machine does not cover everything
● Archives often have unique mementos (small overlap)
● Linguistic and geolocation diversity
● High-quality curated collections
● Restricted resources and private archives
6
@ibnesayeed
MemGator Broadcasting
7
@ibnesayeed
MemGator Broadcasting
8
@ibnesayeed
MemGator Broadcasting
9
@ibnesayeed
MemGator Broadcasting
10
@ibnesayeed
MemGator Broadcasting
11
@ibnesayeed
MemGator Broadcasting
12
@ibnesayeed
MemGator Log Responses From Various Archives
13
93% of the requests
made from MemGator
to upstream archives
were wasteful.
Only about one third
of the requests to the
largest web archive
(IA) were a hit.
@ibnesayeed
Aggregation Is Great, But Broadcasting Is Wasteful
14
What do we want? Aggregate all archives, large or small
What’s the problem? Broadcasting is wasteful and problematic
What’s the solution? Selectively poll archives that are likely to
return good results for a lookup URI
How to identify those? Profile web archives
How to profile archives? MementoMap Framework
Sawood Alam, “MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing”, Doctoral Dissertation, ODU, 2020
@ibnesayeed
If Only Archives Could Tell What to Ask Them For
● Websites advertise their holdings using sitemap.xml, why can’t archives?
○ Archives have billions or even trillions of URI-Ms
○ Such exhaustive lists would go stale very quickly
● How about robots.txt?
○ It is compact, but is exclusion format, it does not tell what the site has
○ It assumes a single domain, patterns are for paths (not the domain name)
● How about well-known URIs?
○ Good for automated discovery of domain-specific metadata resources
● How about combining these ideas?
○ Introducing MementoMap!
15
@ibnesayeed
Memento Lookup Routing
16
Let us fix the broadcasting issue
with a more informed routing.
@ibnesayeed
Archive Profiling Strategies
● Complete URI-R Profiling (1 URI-R = 1 Profile Key) [Sanderson et al., TPDL 2012]
○ bbc.co.uk/images/logo.png?w=90
○ cnn.com/2014/03/15/?id=128734
● TLD-Only Profiling (1 TLD = 1 Profile Key) [AlSum, et al., TPDL 2013]
○ *.com
○ *.uk
● Middle Ground
○ *.cnn.com
○ *.co.uk
○ *.bbc.co.uk
○ bbc.co.uk/images/*
17
We explore
these strategies
in this work.
Top three archives after
IA produce full TimeMaps
52% of the time.
@ibnesayeed
MementoMap Framework Components
● Ingestion
○ CDX files/API
○ Fulltext search
○ Access logs
○ Sample URIs
● Summarization and Serialization
○ Resource constraints
○ Application-specific variants
● Memento Routing
○ Integration with aggregators
18
@ibnesayeed
What is Archived in Arquivo.pt?
What is Accessed from MemGator?
19
2B URI-Rs that have
1-9 mementos each in
Arquivo.pt were never
requested from ODU’s
MemGator server.
43 URI-Rs were
requested thousands
of times each, but
had zero mementos
in Arquivo.pt.
45 URI-Rs had tens
of mementos each
that were requested
hundreds of times.
@ibnesayeed
What is Archived in Arquivo.pt?
What is Accessed from MemGator?
20
Blind spot of a
usage-based
profile
Blind spot of a
content-based
profile
@ibnesayeed
Who Bears the Cost of Bad Routing Decisions?
21
Actual
Present in the Archive Not in the Archive
Predicted
Routed to the
Archive
True Positive (TP) False Positive (FP)
Not Routed to
the Archive
False Negative (FN) True Negative (TN)
FP: Wasteful (Infrastructure suffers)
FN: Disuse (Users suffer)
@ibnesayeed
URI Canonicalization and SURT
22
https://news.bbc.co.uk/images/Logo.png?width=200&height=80&rotate=90%C2%B0#top
http://www.news.BBC.co.uk/images/Logo.png?width=200&height=80&rotate=90%c2%b0#top
http://www.news.bbc.co.uk/images/Logo.png?rotate=90%c2%B0&width=200&height=80
http://NEWS.BBC.CO.UK:80//images//Logo.png?height=80&width=200&rotate=90%c2%b0#top
news.bbc.co.uk/images/Logo.png?height=80&rotate=90%C2%B0&width=200
uk,co,bbc,news,)/images/logo.png?height=80&rotate=90%c2%b0&width=200
Canonicalization
SURT
@ibnesayeed
CDX/CDXJ Summarization
23
http://archive.org/web/researcher/cdx_file_format.php
@ibnesayeed
SURT Representation With Wildcard
24
Original SURTs did not have wildcards.
We introduced it for dynamic profiling.
In practice the common “http://(” prefix
is removed.
@ibnesayeed
Shape of URI Key Tree of Arquivo.pt
25
Alam et al., “MementoMap Framework for Flexible and Adaptive Web Archive Profiling”, JCDL 2019
@ibnesayeed
A MementoMap Example
26
!context ["http://oduwsdl.github.io/contexts/ukvs"]
!id {uri: "http://archive.example.org/"}
!fields {keys: ["surt"], values: ["frequency"]}
!meta {type: "MementoMap", name: "A Test Web Archive", year: 1996}
!meta {updated_at: "2018-09-03T13:27:52Z"}
* 54321/20000
com,* 10000+
org,arxiv)/ 100
org,arxiv)/* 2500~/900
org,arxiv)/pdf/* 0
uk,co,bbc)/images/* 300+/20-
https://github.com/oduwsdl/ORS/blob/master/ukvs.md
Goodbye HmPn/DLim static profiling policies, thanks to our SURT with wildcard.
@ibnesayeed
MementoMap
27
https://github.com/oduwsdl/MementoMap
$ mementomap
Usage: mementomap [-h] {generate,compact,lookup,batchlookup} ...
Positional Arguments:
{generate,compact,lookup,batchlookup}
generate Generate a MementoMap from a sorted file with the
first columns as SURT (e.g., CDX/CDXJ)
compact Compact a large MementoMap file into a small one
lookup Search for a URI/SURT into a MementoMap
batchlookup Search for a list of URIs/SURTs into a MementoMap
Optional Arguments:
-h, --help Show this help message and exit
Alam et al., “MementoMap Framework for Flexible and Adaptive Web Archive Profiling”, JCDL 2019
@ibnesayeed
Processed Lines vs. Compacted MementoMap Growth
28
com,example)/a/1/x
com,example)/a/2
com,example)/a/3
com,example)/b/1
com,example)/b/2
com,example)/c/1
com,example)/a/*
com,example)/b/1
com,example)/b/2
com,example)/c/1
com,example)/*
Alam et al., “MementoMap Framework for Flexible and Adaptive Web Archive Profiling”, JCDL 2019
@ibnesayeed
MementoMap Generation, Compaction, and Lookup
29
Alam et al., “MementoMap Framework for Flexible and Adaptive Web Archive Profiling”, JCDL 2019
1.5% Relative Cost yields 60% Accuracy.
Arquivo.pt can save 60% wasted traffic by
publishing a 119MB summary file!
@ibnesayeed
Why Profile Archival Voids?
30
$ curl -I https://web.archive.org/web/https://quora.com/
HTTP/1.1 403 FORBIDDEN
Server: nginx/1.15.8
Date: Wed, 02 Dec 2020 20:39:33 GMT
Content-Type: text/html; charset=utf-8
Connection: keep-alive
Server-Timing: captures_list;dur=0.150497
X-App-Server: wwwb-app58
X-ts: 403
The Internet Archive has
many “*.com” domains,
but it may not want to
capture or replay some.
Alam et al., “Profiling Web Archival Voids for Memento Routing”, JCDL 2021
@ibnesayeed
Archival Voids Profiles Reduce False Positives
31
org,arxiv)/abs/a 40
org,arxiv)/abs/b 23
org,arxiv)/abs/c 17
org,arxiv)/format/a 15
org,arxiv)/format/b 20
org,arxiv)/format/c 10
org,arxiv)/search/a 30
...
org,arxiv)/abs/* 80
org,arxiv)/format/* 45
org,arxiv)/search/* 60
org,arxiv)/* 185
org,arxiv)/abs/d
False Positive org,arxiv)/pdf/a
org,arxiv)/pdf/b
org,arxiv)/pdf/c
False Positive
org,arxiv)/* 185
org,arxiv)/pdf/* 0 How about summarizing frequently
accessed URIs an archive does not hold?
Alam et al., “Profiling Web Archival Voids for Memento Routing”, JCDL 2021
@ibnesayeed
404-Only Frequencies and Request Savings
32
An archival voids profile of 2.4k URIs, that were accessed hundreds of
times each or more, could have saved about 8.4% of wasted requests.
Alam et al., “Profiling Web Archival Voids for Memento Routing”, JCDL 2021
@ibnesayeed
Archival Voids Recommendations
33
● Keep archival voids profiles separate from archival holdings
● Update often
● Use specific keys with only high confidence
● Profile only resources that are high in demand
● Archives themselves are better sources of truth than external
observers
Alam et al., “Profiling Web Archival Voids for Memento Routing”, JCDL 2021
@ibnesayeed
Dissemination and Discovery Methods
34
GET /.well-known/mementomap HTTP/1.1
Host: arquivo.pt
Link: <https://arquivo.pt/path/to/mementomap.ukvs>;
rel="mementomap"
<link href="https://arquivo.pt/path/to/mementomap.ukvs"
rel="mementomap">
Well-known URI
Link Header
Link HTML Element
@ibnesayeed
MementoMap Adoption Path
● PWA, UKWA, and NLA have shown interest
● PyWB archival replay system is open for implementation
● MemGator and LANL’s Time Travel service are interested
● Big web archives can start with publishing archival voids
○ No need to profile IA
● Archives with access restrictions can have multiple
MementoMaps
● Third parties can create and publish MementoMaps of the
rest of the archives while they catch up
● Coexist with the ongoing IIPC-funded Bloom filters project
35
@ibnesayeed
MementoMap Call for Adoption
36
🕮
MementoMap Framework (Doctoral Dissertation)
https://digitalcommons.odu.edu/computerscience_etds/129/
Unified Key Value Store (UKVS)
https://github.com/oduwsdl/ORS/blob/master/ukvs.md
⚙
MementoMap CLI
https://github.com/oduwsdl/MementoMap
MemGator
https://github.com/oduwsdl/MemGator
$ mementomap generate --hcf=4.0 --pcf=2.0 index.cdx[j] mementomap.ukvs
# Provide sorted list of SURTs to STDIN if not using CDX[J] index
$ scp mementomap.ukvs ${WEBHOST}:${WEBROOT}/.well-known/mementomap
# Preferably, compress the file and allow content negotiation
✉
Email: sawood@archive.org
Twitter: @ibnesayeed
IIPC Slack: #mementomap

More Related Content

What's hot

iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
Justin Brunelle
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web Archiving
Michael Nelson
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Michael Nelson
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Michael Nelson
 
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred RepresentationsScripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations
Justin Brunelle
 
Recommending Archived Webpages Using Only The URI
Recommending Archived Webpages Using Only The URIRecommending Archived Webpages Using Only The URI
Recommending Archived Webpages Using Only The URI
LulwahMA
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Michael Nelson
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple Archives
Michael Nelson
 
Detecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesDetecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web Archives
Yasmin AlNoamany, PhD
 
We Need Multiple, Independent Web Archives
We Need Multiple, Independent Web ArchivesWe Need Multiple, Independent Web Archives
We Need Multiple, Independent Web Archives
Michael Nelson
 
DHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository InteroperabilityDHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository Interoperability
Access Innovations, Inc.
 
Enabling Personal Use of Web Archives
Enabling Personal Use of Web ArchivesEnabling Personal Use of Web Archives
Enabling Personal Use of Web Archives
Michele Weigle
 
Linked Data + Drupal for Oceanographic data management
Linked Data + Drupal for Oceanographic data managementLinked Data + Drupal for Oceanographic data management
Linked Data + Drupal for Oceanographic data management
Biological and Chemical Oceanography Data Management Office
 
URI Disambiguation in the Context of Linked Data
URI Disambiguation in the Context of Linked DataURI Disambiguation in the Context of Linked Data
URI Disambiguation in the Context of Linked Databutest
 
To the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly CommunicationTo the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly Communication
Martin Klein
 
Establishing and Verifying Fixity of Archived Web Pages
Establishing and Verifying Fixity of Archived Web PagesEstablishing and Verifying Fixity of Archived Web Pages
Establishing and Verifying Fixity of Archived Web Pages
maturban
 
Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count
Mat Kelly
 
Creating Pockets of Persistence
Creating Pockets of PersistenceCreating Pockets of Persistence
Creating Pockets of Persistence
Herbert Van de Sompel
 
The Web We Want
The Web We WantThe Web We Want
The Web We Want
Steffen Staab
 
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web ArchivingWho Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
Michael Nelson
 

What's hot (20)

iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web Archiving
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
 
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred RepresentationsScripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations
 
Recommending Archived Webpages Using Only The URI
Recommending Archived Webpages Using Only The URIRecommending Archived Webpages Using Only The URI
Recommending Archived Webpages Using Only The URI
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple Archives
 
Detecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesDetecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web Archives
 
We Need Multiple, Independent Web Archives
We Need Multiple, Independent Web ArchivesWe Need Multiple, Independent Web Archives
We Need Multiple, Independent Web Archives
 
DHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository InteroperabilityDHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository Interoperability
 
Enabling Personal Use of Web Archives
Enabling Personal Use of Web ArchivesEnabling Personal Use of Web Archives
Enabling Personal Use of Web Archives
 
Linked Data + Drupal for Oceanographic data management
Linked Data + Drupal for Oceanographic data managementLinked Data + Drupal for Oceanographic data management
Linked Data + Drupal for Oceanographic data management
 
URI Disambiguation in the Context of Linked Data
URI Disambiguation in the Context of Linked DataURI Disambiguation in the Context of Linked Data
URI Disambiguation in the Context of Linked Data
 
To the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly CommunicationTo the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly Communication
 
Establishing and Verifying Fixity of Archived Web Pages
Establishing and Verifying Fixity of Archived Web PagesEstablishing and Verifying Fixity of Archived Web Pages
Establishing and Verifying Fixity of Archived Web Pages
 
Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count
 
Creating Pockets of Persistence
Creating Pockets of PersistenceCreating Pockets of Persistence
Creating Pockets of Persistence
 
The Web We Want
The Web We WantThe Web We Want
The Web We Want
 
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web ArchivingWho Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
 

Similar to Summarize Your Archival Holdings With MementoMap

Profiling Web Archival Voids for Memento Routing
Profiling Web Archival Voids for Memento RoutingProfiling Web Archival Voids for Memento Routing
Profiling Web Archival Voids for Memento Routing
Sawood Alam
 
MementoMap: An Archive Profile Dissemination Framework
MementoMap: An Archive Profile Dissemination FrameworkMementoMap: An Archive Profile Dissemination Framework
MementoMap: An Archive Profile Dissemination Framework
Sawood Alam
 
(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages
Michael Nelson
 
JCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive ProfilingJCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive Profiling
Sawood Alam
 
November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...
November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...
November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...
National Information Standards Organization (NISO)
 
TPDL 2016 Doctoral Consortium - Web Archive Profiling
TPDL 2016 Doctoral Consortium - Web Archive ProfilingTPDL 2016 Doctoral Consortium - Web Archive Profiling
TPDL 2016 Doctoral Consortium - Web Archive Profiling
Sawood Alam
 
Evaluating Methods to Rediscover Missing Web Pages from the Web Infrastructure
Evaluating Methods to Rediscover Missing Web Pages from the Web InfrastructureEvaluating Methods to Rediscover Missing Web Pages from the Web Infrastructure
Evaluating Methods to Rediscover Missing Web Pages from the Web Infrastructure
Martin Klein
 
Aqua Browser Implementation at Oklahoma State University
Aqua Browser Implementation at Oklahoma State UniversityAqua Browser Implementation at Oklahoma State University
Aqua Browser Implementation at Oklahoma State University
youthelectronix
 
Search Interfaces on the Web: Querying and Characterizing, PhD dissertation
Search Interfaces on the Web: Querying and Characterizing, PhD dissertationSearch Interfaces on the Web: Querying and Characterizing, PhD dissertation
Search Interfaces on the Web: Querying and Characterizing, PhD dissertation
Denis Shestakov
 
Exploiter le Web Semantic, le comprendre et y contribuer
Exploiter le Web Semantic, le comprendre et y contribuerExploiter le Web Semantic, le comprendre et y contribuer
Exploiter le Web Semantic, le comprendre et y contribuer
Mathieu d'Aquin
 
Browser-Based Digital Preservation
Browser-Based Digital PreservationBrowser-Based Digital Preservation
Browser-Based Digital PreservationMat Kelly
 
Representing the world: How web users become web thinkers and web makers
Representing the world: How web users become web thinkers and web makersRepresenting the world: How web users become web thinkers and web makers
Representing the world: How web users become web thinkers and web makersjudell
 
Internet Mashups
Internet MashupsInternet Mashups
Internet Mashups
Cesare Pautasso
 
EIFL 2014 - Linked Open Data
EIFL 2014 - Linked Open DataEIFL 2014 - Linked Open Data
EIFL 2014 - Linked Open Data
Antoine Isaac
 
JahiaOne - Semantic Web with Jahia
JahiaOne - Semantic Web with JahiaJahiaOne - Semantic Web with Jahia
JahiaOne - Semantic Web with Jahia
Jahia Solutions Group
 
Online Collections Crawlability for Libraries, Archives, and Museums
Online Collections Crawlability for Libraries, Archives, and MuseumsOnline Collections Crawlability for Libraries, Archives, and Museums
Online Collections Crawlability for Libraries, Archives, and Museums
mherbison
 
Real-time Semantic Web with Twitter Annotations
Real-time Semantic Web with Twitter AnnotationsReal-time Semantic Web with Twitter Annotations
Real-time Semantic Web with Twitter Annotations
Joshua Shinavier
 
Time -Travel on the Internet
Time -Travel on the InternetTime -Travel on the Internet
Time -Travel on the Internet
IRJET Journal
 
進行中
進行中進行中
進行中maolins
 
進行中
進行中進行中
進行中maolins
 

Similar to Summarize Your Archival Holdings With MementoMap (20)

Profiling Web Archival Voids for Memento Routing
Profiling Web Archival Voids for Memento RoutingProfiling Web Archival Voids for Memento Routing
Profiling Web Archival Voids for Memento Routing
 
MementoMap: An Archive Profile Dissemination Framework
MementoMap: An Archive Profile Dissemination FrameworkMementoMap: An Archive Profile Dissemination Framework
MementoMap: An Archive Profile Dissemination Framework
 
(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages
 
JCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive ProfilingJCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive Profiling
 
November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...
November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...
November 19, 2014 NISO Virtual Conference: Can't We All Work Together?: Inter...
 
TPDL 2016 Doctoral Consortium - Web Archive Profiling
TPDL 2016 Doctoral Consortium - Web Archive ProfilingTPDL 2016 Doctoral Consortium - Web Archive Profiling
TPDL 2016 Doctoral Consortium - Web Archive Profiling
 
Evaluating Methods to Rediscover Missing Web Pages from the Web Infrastructure
Evaluating Methods to Rediscover Missing Web Pages from the Web InfrastructureEvaluating Methods to Rediscover Missing Web Pages from the Web Infrastructure
Evaluating Methods to Rediscover Missing Web Pages from the Web Infrastructure
 
Aqua Browser Implementation at Oklahoma State University
Aqua Browser Implementation at Oklahoma State UniversityAqua Browser Implementation at Oklahoma State University
Aqua Browser Implementation at Oklahoma State University
 
Search Interfaces on the Web: Querying and Characterizing, PhD dissertation
Search Interfaces on the Web: Querying and Characterizing, PhD dissertationSearch Interfaces on the Web: Querying and Characterizing, PhD dissertation
Search Interfaces on the Web: Querying and Characterizing, PhD dissertation
 
Exploiter le Web Semantic, le comprendre et y contribuer
Exploiter le Web Semantic, le comprendre et y contribuerExploiter le Web Semantic, le comprendre et y contribuer
Exploiter le Web Semantic, le comprendre et y contribuer
 
Browser-Based Digital Preservation
Browser-Based Digital PreservationBrowser-Based Digital Preservation
Browser-Based Digital Preservation
 
Representing the world: How web users become web thinkers and web makers
Representing the world: How web users become web thinkers and web makersRepresenting the world: How web users become web thinkers and web makers
Representing the world: How web users become web thinkers and web makers
 
Internet Mashups
Internet MashupsInternet Mashups
Internet Mashups
 
EIFL 2014 - Linked Open Data
EIFL 2014 - Linked Open DataEIFL 2014 - Linked Open Data
EIFL 2014 - Linked Open Data
 
JahiaOne - Semantic Web with Jahia
JahiaOne - Semantic Web with JahiaJahiaOne - Semantic Web with Jahia
JahiaOne - Semantic Web with Jahia
 
Online Collections Crawlability for Libraries, Archives, and Museums
Online Collections Crawlability for Libraries, Archives, and MuseumsOnline Collections Crawlability for Libraries, Archives, and Museums
Online Collections Crawlability for Libraries, Archives, and Museums
 
Real-time Semantic Web with Twitter Annotations
Real-time Semantic Web with Twitter AnnotationsReal-time Semantic Web with Twitter Annotations
Real-time Semantic Web with Twitter Annotations
 
Time -Travel on the Internet
Time -Travel on the InternetTime -Travel on the Internet
Time -Travel on the Internet
 
進行中
進行中進行中
進行中
 
進行中
進行中進行中
進行中
 

More from Sawood Alam

TrendMachine: Temporal Resilience of Web Pages
TrendMachine: Temporal Resilience of Web PagesTrendMachine: Temporal Resilience of Web Pages
TrendMachine: Temporal Resilience of Web Pages
Sawood Alam
 
CDX Summary: Web Archival Collection Insights
CDX Summary: Web Archival Collection InsightsCDX Summary: Web Archival Collection Insights
CDX Summary: Web Archival Collection Insights
Sawood Alam
 
Video Archiving and Playback in the Wayback Machine
Video Archiving and Playback in the Wayback MachineVideo Archiving and Playback in the Wayback Machine
Video Archiving and Playback in the Wayback Machine
Sawood Alam
 
Web ARChive (WARC) File Format
Web ARChive (WARC) File FormatWeb ARChive (WARC) File Format
Web ARChive (WARC) File Format
Sawood Alam
 
MemGator - A Memento Aggregator CLI and Server in Go
MemGator - A Memento Aggregator CLI and Server in GoMemGator - A Memento Aggregator CLI and Server in Go
MemGator - A Memento Aggregator CLI and Server in Go
Sawood Alam
 
Dockerize Your Projects - A Brief Introduction to Containerization
Dockerize Your Projects - A Brief Introduction to ContainerizationDockerize Your Projects - A Brief Introduction to Containerization
Dockerize Your Projects - A Brief Introduction to Containerization
Sawood Alam
 
Avoiding Zombies in Archival Replay Using ServiceWorker
Avoiding Zombies in Archival Replay Using ServiceWorkerAvoiding Zombies in Archival Replay Using ServiceWorker
Avoiding Zombies in Archival Replay Using ServiceWorker
Sawood Alam
 
Client-side Reconstruction of Composite Mementos Using ServiceWorker
Client-side Reconstruction of Composite Mementos Using ServiceWorkerClient-side Reconstruction of Composite Mementos Using ServiceWorker
Client-side Reconstruction of Composite Mementos Using ServiceWorker
Sawood Alam
 
Introducing Web Archiving and WSDL Research Group
Introducing Web Archiving and WSDL Research GroupIntroducing Web Archiving and WSDL Research Group
Introducing Web Archiving and WSDL Research Group
Sawood Alam
 
InterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives
InterPlanetary Wayback: Peer-To-Peer Permanence of Web ArchivesInterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives
InterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives
Sawood Alam
 
Web Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext SearchWeb Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext Search
Sawood Alam
 
Web Archiving: A Brief Introduction
Web Archiving: A Brief IntroductionWeb Archiving: A Brief Introduction
Web Archiving: A Brief Introduction
Sawood Alam
 
TPDL 2015 - Profiling Web Archives
TPDL 2015 - Profiling Web ArchivesTPDL 2015 - Profiling Web Archives
TPDL 2015 - Profiling Web Archives
Sawood Alam
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web Archives
Sawood Alam
 
Improving Accessibility of Archived Raster Dictionaries of Complex Script Lan...
Improving Accessibility of Archived Raster Dictionaries of Complex Script Lan...Improving Accessibility of Archived Raster Dictionaries of Complex Script Lan...
Improving Accessibility of Archived Raster Dictionaries of Complex Script Lan...
Sawood Alam
 
Profile Serialization IIPC GA 2015
Profile Serialization IIPC GA 2015Profile Serialization IIPC GA 2015
Profile Serialization IIPC GA 2015
Sawood Alam
 
Profiling Web Archives IIPC GA 2015
Profiling Web Archives IIPC GA 2015Profiling Web Archives IIPC GA 2015
Profiling Web Archives IIPC GA 2015
Sawood Alam
 
Web Archiving: A Brief Introduction
Web Archiving: A Brief IntroductionWeb Archiving: A Brief Introduction
Web Archiving: A Brief Introduction
Sawood Alam
 
HTTP Mailbox - Asynchronous RESTful Communication
HTTP Mailbox - Asynchronous RESTful CommunicationHTTP Mailbox - Asynchronous RESTful Communication
HTTP Mailbox - Asynchronous RESTful Communication
Sawood Alam
 

More from Sawood Alam (19)

TrendMachine: Temporal Resilience of Web Pages
TrendMachine: Temporal Resilience of Web PagesTrendMachine: Temporal Resilience of Web Pages
TrendMachine: Temporal Resilience of Web Pages
 
CDX Summary: Web Archival Collection Insights
CDX Summary: Web Archival Collection InsightsCDX Summary: Web Archival Collection Insights
CDX Summary: Web Archival Collection Insights
 
Video Archiving and Playback in the Wayback Machine
Video Archiving and Playback in the Wayback MachineVideo Archiving and Playback in the Wayback Machine
Video Archiving and Playback in the Wayback Machine
 
Web ARChive (WARC) File Format
Web ARChive (WARC) File FormatWeb ARChive (WARC) File Format
Web ARChive (WARC) File Format
 
MemGator - A Memento Aggregator CLI and Server in Go
MemGator - A Memento Aggregator CLI and Server in GoMemGator - A Memento Aggregator CLI and Server in Go
MemGator - A Memento Aggregator CLI and Server in Go
 
Dockerize Your Projects - A Brief Introduction to Containerization
Dockerize Your Projects - A Brief Introduction to ContainerizationDockerize Your Projects - A Brief Introduction to Containerization
Dockerize Your Projects - A Brief Introduction to Containerization
 
Avoiding Zombies in Archival Replay Using ServiceWorker
Avoiding Zombies in Archival Replay Using ServiceWorkerAvoiding Zombies in Archival Replay Using ServiceWorker
Avoiding Zombies in Archival Replay Using ServiceWorker
 
Client-side Reconstruction of Composite Mementos Using ServiceWorker
Client-side Reconstruction of Composite Mementos Using ServiceWorkerClient-side Reconstruction of Composite Mementos Using ServiceWorker
Client-side Reconstruction of Composite Mementos Using ServiceWorker
 
Introducing Web Archiving and WSDL Research Group
Introducing Web Archiving and WSDL Research GroupIntroducing Web Archiving and WSDL Research Group
Introducing Web Archiving and WSDL Research Group
 
InterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives
InterPlanetary Wayback: Peer-To-Peer Permanence of Web ArchivesInterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives
InterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives
 
Web Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext SearchWeb Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext Search
 
Web Archiving: A Brief Introduction
Web Archiving: A Brief IntroductionWeb Archiving: A Brief Introduction
Web Archiving: A Brief Introduction
 
TPDL 2015 - Profiling Web Archives
TPDL 2015 - Profiling Web ArchivesTPDL 2015 - Profiling Web Archives
TPDL 2015 - Profiling Web Archives
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web Archives
 
Improving Accessibility of Archived Raster Dictionaries of Complex Script Lan...
Improving Accessibility of Archived Raster Dictionaries of Complex Script Lan...Improving Accessibility of Archived Raster Dictionaries of Complex Script Lan...
Improving Accessibility of Archived Raster Dictionaries of Complex Script Lan...
 
Profile Serialization IIPC GA 2015
Profile Serialization IIPC GA 2015Profile Serialization IIPC GA 2015
Profile Serialization IIPC GA 2015
 
Profiling Web Archives IIPC GA 2015
Profiling Web Archives IIPC GA 2015Profiling Web Archives IIPC GA 2015
Profiling Web Archives IIPC GA 2015
 
Web Archiving: A Brief Introduction
Web Archiving: A Brief IntroductionWeb Archiving: A Brief Introduction
Web Archiving: A Brief Introduction
 
HTTP Mailbox - Asynchronous RESTful Communication
HTTP Mailbox - Asynchronous RESTful CommunicationHTTP Mailbox - Asynchronous RESTful Communication
HTTP Mailbox - Asynchronous RESTful Communication
 

Recently uploaded

Comptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guideComptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guide
GTProductions1
 
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Brad Spiegel Macon GA
 
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
Javier Lasa
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
ufdana
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
Rogerio Filho
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
nirahealhty
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
Gal Baras
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Sanjeev Rampal
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
laozhuseo02
 
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptxInternet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
VivekSinghShekhawat2
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
3ipehhoa
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
JeyaPerumal1
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
keoku
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
eutxy
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
laozhuseo02
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
JungkooksNonexistent
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
natyesu
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
3ipehhoa
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
3ipehhoa
 

Recently uploaded (20)

Comptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guideComptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guide
 
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
 
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
 
The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
 
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptxInternet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
 

Summarize Your Archival Holdings With MementoMap

  • 1. Sawood Alam, Internet Archive Michael L. Nelson, Old Dominion University Michele C. Weigle, Old Dominion University Daniel Gomes, Arquivo.pt Summarize Your Archival Holdings With MementoMap IIPC Web Archiving Conference, June 16, 2021 #MementoMap @ibnesayeed
  • 2. 2
  • 3. @ibnesayeed 3 $ memgator -f cdxj http://si.edu/ | grep -v "^!" | cut -d'/' -f3 | sort | uniq -c | sort -nr 13263 web.archive.org 3590 wayback.archive-it.org 1202 web.archive.bibalex.org 651 webarchive.loc.gov 321 arquivo.pt 32 wayback.vefsafn.is 11 web.archive.org.au 3 archive.is 1 www.webarchive.org.uk 1 swap.stanford.edu 1 perma.cc $ memgator -f cdxj http://odu.edu/ | grep -v "^!" | cut -d'/' -f3 | sort | uniq -c | sort -nr 3071 web.archive.org 796 wayback.archive-it.org 751 web.archive.bibalex.org 99 webarchive.loc.gov 26 arquivo.pt 2 archive.is 1 wayback.vefsafn.is Cross-Archive Memento Lookup With MemGator Although there are 13k+ mementos in IA, there are also mementos in 10 other public web archives. https://github.com/oduwsdl/MemGator ODU is less popular, but there are mementos in 7 different web archives.
  • 4. @ibnesayeed Who Would Have Thought to Lookup in the Icelandic Web Archive for odu.edu Mementos? 4 http://wayback.vefsafn.is/wayback/20100810032449/http://odu.edu/
  • 5. @ibnesayeed Prevalence of Sample Query URI Sets in Archives 5 Sample (1M URIs Each) In Archive-It In UKWA In Stanford Union {AIT, UK, SU} DMOZ 4.097% 3.594% 0.034% 7.575% MementoProxy 4.182% 0.408% 0.046% 4.527% IAWayback 3.716% 0.519% 0.039% 4.165% UKWayback 0.108% 0.034% 0.002% 0.134% Alam et al., “Web Archive Profiling Through CDX Summarization”, IJDL 2016
  • 6. @ibnesayeed Why Aggregate Small Archives? ● Wayback Machine does not cover everything ● Archives often have unique mementos (small overlap) ● Linguistic and geolocation diversity ● High-quality curated collections ● Restricted resources and private archives 6
  • 13. @ibnesayeed MemGator Log Responses From Various Archives 13 93% of the requests made from MemGator to upstream archives were wasteful. Only about one third of the requests to the largest web archive (IA) were a hit.
  • 14. @ibnesayeed Aggregation Is Great, But Broadcasting Is Wasteful 14 What do we want? Aggregate all archives, large or small What’s the problem? Broadcasting is wasteful and problematic What’s the solution? Selectively poll archives that are likely to return good results for a lookup URI How to identify those? Profile web archives How to profile archives? MementoMap Framework Sawood Alam, “MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing”, Doctoral Dissertation, ODU, 2020
  • 15. @ibnesayeed If Only Archives Could Tell What to Ask Them For ● Websites advertise their holdings using sitemap.xml, why can’t archives? ○ Archives have billions or even trillions of URI-Ms ○ Such exhaustive lists would go stale very quickly ● How about robots.txt? ○ It is compact, but is exclusion format, it does not tell what the site has ○ It assumes a single domain, patterns are for paths (not the domain name) ● How about well-known URIs? ○ Good for automated discovery of domain-specific metadata resources ● How about combining these ideas? ○ Introducing MementoMap! 15
  • 16. @ibnesayeed Memento Lookup Routing 16 Let us fix the broadcasting issue with a more informed routing.
  • 17. @ibnesayeed Archive Profiling Strategies ● Complete URI-R Profiling (1 URI-R = 1 Profile Key) [Sanderson et al., TPDL 2012] ○ bbc.co.uk/images/logo.png?w=90 ○ cnn.com/2014/03/15/?id=128734 ● TLD-Only Profiling (1 TLD = 1 Profile Key) [AlSum, et al., TPDL 2013] ○ *.com ○ *.uk ● Middle Ground ○ *.cnn.com ○ *.co.uk ○ *.bbc.co.uk ○ bbc.co.uk/images/* 17 We explore these strategies in this work. Top three archives after IA produce full TimeMaps 52% of the time.
  • 18. @ibnesayeed MementoMap Framework Components ● Ingestion ○ CDX files/API ○ Fulltext search ○ Access logs ○ Sample URIs ● Summarization and Serialization ○ Resource constraints ○ Application-specific variants ● Memento Routing ○ Integration with aggregators 18
  • 19. @ibnesayeed What is Archived in Arquivo.pt? What is Accessed from MemGator? 19 2B URI-Rs that have 1-9 mementos each in Arquivo.pt were never requested from ODU’s MemGator server. 43 URI-Rs were requested thousands of times each, but had zero mementos in Arquivo.pt. 45 URI-Rs had tens of mementos each that were requested hundreds of times.
  • 20. @ibnesayeed What is Archived in Arquivo.pt? What is Accessed from MemGator? 20 Blind spot of a usage-based profile Blind spot of a content-based profile
  • 21. @ibnesayeed Who Bears the Cost of Bad Routing Decisions? 21 Actual Present in the Archive Not in the Archive Predicted Routed to the Archive True Positive (TP) False Positive (FP) Not Routed to the Archive False Negative (FN) True Negative (TN) FP: Wasteful (Infrastructure suffers) FN: Disuse (Users suffer)
  • 22. @ibnesayeed URI Canonicalization and SURT 22 https://news.bbc.co.uk/images/Logo.png?width=200&height=80&rotate=90%C2%B0#top http://www.news.BBC.co.uk/images/Logo.png?width=200&height=80&rotate=90%c2%b0#top http://www.news.bbc.co.uk/images/Logo.png?rotate=90%c2%B0&width=200&height=80 http://NEWS.BBC.CO.UK:80//images//Logo.png?height=80&width=200&rotate=90%c2%b0#top news.bbc.co.uk/images/Logo.png?height=80&rotate=90%C2%B0&width=200 uk,co,bbc,news,)/images/logo.png?height=80&rotate=90%c2%b0&width=200 Canonicalization SURT
  • 24. @ibnesayeed SURT Representation With Wildcard 24 Original SURTs did not have wildcards. We introduced it for dynamic profiling. In practice the common “http://(” prefix is removed.
  • 25. @ibnesayeed Shape of URI Key Tree of Arquivo.pt 25 Alam et al., “MementoMap Framework for Flexible and Adaptive Web Archive Profiling”, JCDL 2019
  • 26. @ibnesayeed A MementoMap Example 26 !context ["http://oduwsdl.github.io/contexts/ukvs"] !id {uri: "http://archive.example.org/"} !fields {keys: ["surt"], values: ["frequency"]} !meta {type: "MementoMap", name: "A Test Web Archive", year: 1996} !meta {updated_at: "2018-09-03T13:27:52Z"} * 54321/20000 com,* 10000+ org,arxiv)/ 100 org,arxiv)/* 2500~/900 org,arxiv)/pdf/* 0 uk,co,bbc)/images/* 300+/20- https://github.com/oduwsdl/ORS/blob/master/ukvs.md Goodbye HmPn/DLim static profiling policies, thanks to our SURT with wildcard.
  • 27. @ibnesayeed MementoMap 27 https://github.com/oduwsdl/MementoMap $ mementomap Usage: mementomap [-h] {generate,compact,lookup,batchlookup} ... Positional Arguments: {generate,compact,lookup,batchlookup} generate Generate a MementoMap from a sorted file with the first columns as SURT (e.g., CDX/CDXJ) compact Compact a large MementoMap file into a small one lookup Search for a URI/SURT into a MementoMap batchlookup Search for a list of URIs/SURTs into a MementoMap Optional Arguments: -h, --help Show this help message and exit Alam et al., “MementoMap Framework for Flexible and Adaptive Web Archive Profiling”, JCDL 2019
  • 28. @ibnesayeed Processed Lines vs. Compacted MementoMap Growth 28 com,example)/a/1/x com,example)/a/2 com,example)/a/3 com,example)/b/1 com,example)/b/2 com,example)/c/1 com,example)/a/* com,example)/b/1 com,example)/b/2 com,example)/c/1 com,example)/* Alam et al., “MementoMap Framework for Flexible and Adaptive Web Archive Profiling”, JCDL 2019
  • 29. @ibnesayeed MementoMap Generation, Compaction, and Lookup 29 Alam et al., “MementoMap Framework for Flexible and Adaptive Web Archive Profiling”, JCDL 2019 1.5% Relative Cost yields 60% Accuracy. Arquivo.pt can save 60% wasted traffic by publishing a 119MB summary file!
  • 30. @ibnesayeed Why Profile Archival Voids? 30 $ curl -I https://web.archive.org/web/https://quora.com/ HTTP/1.1 403 FORBIDDEN Server: nginx/1.15.8 Date: Wed, 02 Dec 2020 20:39:33 GMT Content-Type: text/html; charset=utf-8 Connection: keep-alive Server-Timing: captures_list;dur=0.150497 X-App-Server: wwwb-app58 X-ts: 403 The Internet Archive has many “*.com” domains, but it may not want to capture or replay some. Alam et al., “Profiling Web Archival Voids for Memento Routing”, JCDL 2021
  • 31. @ibnesayeed Archival Voids Profiles Reduce False Positives 31 org,arxiv)/abs/a 40 org,arxiv)/abs/b 23 org,arxiv)/abs/c 17 org,arxiv)/format/a 15 org,arxiv)/format/b 20 org,arxiv)/format/c 10 org,arxiv)/search/a 30 ... org,arxiv)/abs/* 80 org,arxiv)/format/* 45 org,arxiv)/search/* 60 org,arxiv)/* 185 org,arxiv)/abs/d False Positive org,arxiv)/pdf/a org,arxiv)/pdf/b org,arxiv)/pdf/c False Positive org,arxiv)/* 185 org,arxiv)/pdf/* 0 How about summarizing frequently accessed URIs an archive does not hold? Alam et al., “Profiling Web Archival Voids for Memento Routing”, JCDL 2021
  • 32. @ibnesayeed 404-Only Frequencies and Request Savings 32 An archival voids profile of 2.4k URIs, that were accessed hundreds of times each or more, could have saved about 8.4% of wasted requests. Alam et al., “Profiling Web Archival Voids for Memento Routing”, JCDL 2021
  • 33. @ibnesayeed Archival Voids Recommendations 33 ● Keep archival voids profiles separate from archival holdings ● Update often ● Use specific keys with only high confidence ● Profile only resources that are high in demand ● Archives themselves are better sources of truth than external observers Alam et al., “Profiling Web Archival Voids for Memento Routing”, JCDL 2021
  • 34. @ibnesayeed Dissemination and Discovery Methods 34 GET /.well-known/mementomap HTTP/1.1 Host: arquivo.pt Link: <https://arquivo.pt/path/to/mementomap.ukvs>; rel="mementomap" <link href="https://arquivo.pt/path/to/mementomap.ukvs" rel="mementomap"> Well-known URI Link Header Link HTML Element
  • 35. @ibnesayeed MementoMap Adoption Path ● PWA, UKWA, and NLA have shown interest ● PyWB archival replay system is open for implementation ● MemGator and LANL’s Time Travel service are interested ● Big web archives can start with publishing archival voids ○ No need to profile IA ● Archives with access restrictions can have multiple MementoMaps ● Third parties can create and publish MementoMaps of the rest of the archives while they catch up ● Coexist with the ongoing IIPC-funded Bloom filters project 35
  • 36. @ibnesayeed MementoMap Call for Adoption 36 🕮 MementoMap Framework (Doctoral Dissertation) https://digitalcommons.odu.edu/computerscience_etds/129/ Unified Key Value Store (UKVS) https://github.com/oduwsdl/ORS/blob/master/ukvs.md ⚙ MementoMap CLI https://github.com/oduwsdl/MementoMap MemGator https://github.com/oduwsdl/MemGator $ mementomap generate --hcf=4.0 --pcf=2.0 index.cdx[j] mementomap.ukvs # Provide sorted list of SURTs to STDIN if not using CDX[J] index $ scp mementomap.ukvs ${WEBHOST}:${WEBROOT}/.well-known/mementomap # Preferably, compress the file and allow content negotiation ✉ Email: sawood@archive.org Twitter: @ibnesayeed IIPC Slack: #mementomap