SlideShare a Scribd company logo
EVALUATING SLIDING AND
STICKY TARGET POLICIES
BY MEASURING TEMPORAL
DRIFT IN ACYCLIC WALKS
THROUGH A WEB ARCHIVE
SCOTT G. AINSWORTH
MICHAEL L. NELSON
OLD DOMINION UNIVERSITY
COMPUTER SCIENCE
JCDL 2013
JULY 23-25, 2013
INDIANAPOLIS, INDIANA USA
JointConferenceonDigitalLibraries(JCDL)2013
A FABLE FROM WAYBACK
7/23/13 Scott G. Ainsworth • Michael L. Nelson
2
A long, long time ago…
ODU Computer Science
updated its web site…
What did it look like?
May 2005...
JointConferenceonDigitalLibraries(JCDL)2013
A FABLE FROM WAYBACK
7/23/13 Scott G. Ainsworth • Michael L. Nelson
3
JointConferenceonDigitalLibraries(JCDL)2013
A FABLE FROM WAYBACK
7/23/13 Scott G. Ainsworth • Michael L. Nelson
4
JointConferenceonDigitalLibraries(JCDL)2013
A FABLE FROM WAYBACK
7/23/13 Scott G. Ainsworth • Michael L. Nelson
5
JointConferenceonDigitalLibraries(JCDL)2013
A FABLE FROM WAYBACK
7/23/13 Scott G. Ainsworth • Michael L. Nelson
6
JointConferenceonDigitalLibraries(JCDL)2013
WHAT JUST HAPPENED?
WHAT WE EXPECTED
2005-05-14 @ 01:36:08
WHAT WE GOT
2005-03-31 @ 09:16:10
7/23/13 Scott G. Ainsworth • Michael L. Nelson
7
JointConferenceonDigitalLibraries(JCDL)2013
SLIDING TARGET
7/23/13 Scott G. Ainsworth • Michael L. Nelson
8
2005-05-14
01:36:08
JointConferenceonDigitalLibraries(JCDL)2013
SLIDING TARGET
7/23/13 Scott G. Ainsworth • Michael L. Nelson
9
2005-04-22
00:17:52
JointConferenceonDigitalLibraries(JCDL)2013
SLIDING TARGET
7/23/13 Scott G. Ainsworth • Michael L. Nelson
10
2005-03-31
09:16:10
JointConferenceonDigitalLibraries(JCDL)2013
STICKY TARGET
What if the target
is held steady?
(Enabled by Memento API)
7/23/13 Scott G. Ainsworth • Michael L. Nelson
11
JointConferenceonDigitalLibraries(JCDL)2013
MEMENTO HTTP EXTENSION*
Adds ability to request a particular date-time
Enables Sticky Target
Request
Response
7/23/13 Scott G. Ainsworth • Michael L. Nelson
12
GET <timegate>/http://www.cs.odu.edu/ HTTP/1.1
…
Accept-Datetime: Sat, 10 May 2005 11:21:00 GMT
…
HTTP/1.1 200 OK
…
Memento-Datetime: Sat, 14 May 2005 01:36:08 GMT
…
*https://datatracker.ietf.org/doc/draft-vandesompel-memento/
JointConferenceonDigitalLibraries(JCDL)2013
2005-05-
14
2005-05-
14
01:36:08
STICKY TARGET
7/23/13 Scott G. Ainsworth • Michael L. Nelson
13
MementoFoxExtension
JointConferenceonDigitalLibraries(JCDL)2013
STICKY TARGET
7/23/13 Scott G. Ainsworth • Michael L. Nelson
14
2005-04-22
00:17:52
JointConferenceonDigitalLibraries(JCDL)2013
STICKY TARGET
7/23/13 Scott G. Ainsworth • Michael L. Nelson
15
2005-05-
14
01:36:08
JointConferenceonDigitalLibraries(JCDL)2013
DRIFT COMPARISON
Page
Sliding Sticky
Datetime Drift Datetime Drift
CS Home
2005-05-14
01:36:08
–
2005-05-14
01:36:08
–
Science
Home
2005-04-22
00:17:52
22.1 days
2005-04-22
00:17:52
22.1 days
CS Home
2005-03-31
09:16:10
43.7 days
(+21.6 days)
2005-05-14
01:36:08
–
Mean 32.9 days 11.0 days
7/23/13 Scott G. Ainsworth • Michael L. Nelson
16
JointConferenceonDigitalLibraries(JCDL)2013
QUESTIONS
How much temporal drift is there with the two
policies?
Does the sticky policy reduce drift as expected?
If so, by how much?
How do
• Choice (number of links)
• Domains visited
• Walk length
Influence drift?
7/23/13 Scott G. Ainsworth • Michael L. Nelson
17
JointConferenceonDigitalLibraries(JCDL)2013
CONTENTS
 Motivation
 Related work
 Measuring Drift
 Results
 Future work
7/23/13 Scott G. Ainsworth • Michael L. Nelson
18
JointConferenceonDigitalLibraries(JCDL)2013
RELATED WORK
Control Crawl Data Quality, Future collections
• Spaniol et al. – crawling strategy
• Denev et al. – change rates by MIME type and
depth
• Ben Saad et al. – metadata from crawl used to
select best results from archive
Our Focus: Existing Data Quality
• Existing collections
• Datetime selection policies
7/23/13 Scott G. Ainsworth • Michael L. Nelson
19
JointConferenceonDigitalLibraries(JCDL)2013
CONTENTS
 Motivation
 Related work
 Measuring drift
 Results
 Future work & conclusions
7/23/13 Scott G. Ainsworth • Michael L. Nelson
20
JointConferenceonDigitalLibraries(JCDL)2013
DEFINITIONS
Walk Length
Number of successful steps
(HTTP 200 response)
Unique
Domains
Number of unique domains
(jcdl.org, amazon.com, etc.)
Choice
Number of unique links
(calculated per page)
Drift | target-datetime1 – Memento-Datetimei |
7/23/13 Scott G. Ainsworth • Michael L. Nelson
21
JointConferenceonDigitalLibraries(JCDL)2013
PROCESS BY EXAMPLE
Select a URI
• Random selection of 1 out of 4,000
4000 Sample URIs – same as JCDL 2011 paper
• DMOZ – a reference
• Search Engines – best random sampling
• Bitly – does shortening have an impact?
• Delicious – does popularity have an impact?
“How Much of the Web Is Archived?”
http://arxiv.org/abs/1212.6177
7/23/13 Scott G. Ainsworth • Michael L. Nelson
22
JointConferenceonDigitalLibraries(JCDL)2013
PROCESS BY EXAMPLE
First, select a URI
• Random selection of 1 out of 4,000
Second, download timemap
7/23/13 Scott G. Ainsworth • Michael L. Nelson
23
<http://api.wayback.archive.org/memento/20050507093740/http://www.cs.odu.edu/>;
rel="memento";
datetime="Sat, 07 May 2005 09:37:40 GMT",
<http://api.wayback.archive.org/memento/20050514013608/http://www.cs.odu.edu/>;
rel="memento";
datetime="Sat, 14 May 2005 01:36:08 GMT",
<http://api.wayback.archive.org/memento/20050515002903/http://www.cs.odu.edu/>;
rel="memento";
datetime="Sun, 15 May 2005 00:29:03 GMT",
<http://api.wayback.archive.org/memento/20050514013608/http://www.cs.odu.edu/>;
rel="memento";
datetime="Sat, 14 May 2005 01:36:08 GMT",
JointConferenceonDigitalLibraries(JCDL)2013
PROCESS BY EXAMPLE
Next, download both mementos
Wayback Machine Memento API
7/23/13 Scott G. Ainsworth • Michael L. Nelson
24
JointConferenceonDigitalLibraries(JCDL)2013
PROCESS BY EXAMPLE
Next, download both mementos
And Find common links
Wayback Machine Memento API
7/23/13 Scott G. Ainsworth • Michael L. Nelson
25
JointConferenceonDigitalLibraries(JCDL)2013
STATUS SO FAR
Successful Steps 1
Unique Domains 1
Choice 48
Mean Drift (days) 0.0 WB 0.0 API
7/23/13 Scott G. Ainsworth • Michael L. Nelson
26
JointConferenceonDigitalLibraries(JCDL)2013
PROCESS BY EXAMPLE
Find common links
and select one for the next step
Wayback Machine Memento API
7/23/13 Scott G. Ainsworth • Michael L. Nelson
27
JointConferenceonDigitalLibraries(JCDL)2013
PROCESS BY EXAMPLE
The timemap downloaded, the best datetimes are
selected, and the memento downloaded…
Wayback Machine Memento API
7/23/13 Scott G. Ainsworth • Michael L. Nelson
28
Successful Steps 1 + 1 = 2
Unique Domains 1 + 0 = 1
Choice 48 + 36 = 84
Mean Drift (days) 11.0 WB 11.0 API
JointConferenceonDigitalLibraries(JCDL)2013
PROCESS BY EXAMPLE
Again for http://www.odu.edu
Wayback Machine Memento API
7/23/13 Scott G. Ainsworth • Michael L. Nelson
29
Successful Steps 2 + 1 = 3
Unique Domains 1 + 0 = 1
Choice 84 + 33 = 117
Mean Drift (days) 14.7 WB 7.4 API
JointConferenceonDigitalLibraries(JCDL)2013
HTTP Response:
• 302 Redirect
• Location header
PROCESS BY EXAMPLE
And for http://www.odusports.com
Redirected at acquisition time
Wayback Machine Memento API
7/23/13 Scott G. Ainsworth • Michael L. Nelson
30
JointConferenceonDigitalLibraries(JCDL)2013
PROCESS BY EXAMPLE
And for http://odusports.collegesports.com
Wayback Machine Memento API
7/23/13 Scott G. Ainsworth • Michael L. Nelson
31
Successful Steps 3 + 1 = 4
Unique Domains 1 + 1 = 2
Choice 117 + 77 = 194
Mean Drift (days) 18.2 WB 7.3 API
JointConferenceonDigitalLibraries(JCDL)2013
PROCESS BY EXAMPLE
And for http://www.vtext.com
Wayback Machine Memento API
7/23/13 Scott G. Ainsworth • Michael L. Nelson
32
Successful Steps 4 + 1 = 5
Unique Domains 2 + 1 = 3
Choice 194 + 14 = 208
Mean Drift (days) 20.3 WB 5.8 API
JointConferenceonDigitalLibraries(JCDL)2013
PROCESS BY EXAMPLE
And 404 stops the walk
Wayback Machine Memento API
7/23/13 Scott G. Ainsworth • Michael L. Nelson
33
HTTP Response:
• 404 Not Found
Successful Steps 4 + 1 = 5
Unique Domains 2 + 1 = 3
Choice 194 + 14 = 208
Mean Drift (days) 20.3 WB 5.8 API
JointConferenceonDigitalLibraries(JCDL)2013
STOP CAUSES
First Step Subsequent Steps
Stop Cause Count Percent Count Percent
Timemaps
HTTP 403 74 1.7% 4,803 9.1%
HTTP 404 1,327 30.1% 15,850 29.0%
HTTP 503 0 0.0% 43 0.1%
Other 2 0.0% 180 0.3%
Mementos
HTTP 403 52 1.2% 476 0.9%
HTTP 404 215 4.9% 3,633 6.8%
HTTP 503 1,957 44.4% 10,535 19.9%
Download failed 154 3.5% 589 1.1%
Not HTML 514 11.7% 2,856 5.4%
No Common Links 0 0.0% 12,957 24.4%
Other 117 2.7% 1,128 2.1%
Totals 4,412 53,050
7/23/13 Scott G. Ainsworth • Michael L. Nelson
34
JointConferenceonDigitalLibraries(JCDL)2013
CONTENTS
 Motivation
 Related work
 Measuring drift
 Results
 Future work & conclusions
7/23/13 Scott G. Ainsworth • Michael L. Nelson
35
JointConferenceonDigitalLibraries(JCDL)2013
WALKS AND STEPS
Status Total
Walks Attempted 200,000
Unique Walks 53,100
Successful Walks 48,685
Pct. Successful 91.7%
Steps 240,439
Successful Steps 187,371
w/drift > 1yr 6,701
w/drift > 5yrs 111
Successful Steps/Walk 3.8
7/23/13 Scott G. Ainsworth • Michael L. Nelson
36
JointConferenceonDigitalLibraries(JCDL)2013
WALK LENGTHS
1 10 20 30 40 50
Occurrences (log scale)
Walk Length
Occurrences(logscale)
110100100010000
7/23/13 Scott G. Ainsworth • Michael L. Nelson
37
Walk Length
Occurrences(logscale)
JointConferenceonDigitalLibraries(JCDL)2013
MEDIAN DRIFT BY STEP
Median Drift by Step
Step Number
MedianDrift(Months)
1 10 20 30 40 50
01m2m3m
API
UI
●
●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●
●●●●
●
●
●●●●●●●●
●
●
●
●
●
●
●●●
●●●●●●●●●●●●●●●●●●●●●●
●●
●●
●●●●●
●
●
●
●
●●●●
●
●
● Sliding
● Sticky
MedianDrift(months)
7/23/13 Scott G. Ainsworth • Michael L. Nelson
38
Step Number
JointConferenceonDigitalLibraries(JCDL)2013
DRIFT BY STEP
SLIDING POLICY STICKY POLICY
Drift by Step (UI)
At least 1 memento
At least 8 mementos
At least 64 mementos
At least 512 mementos
At least 4,096 mementos
At least 32,768 mementos
Drift by Step (API)
Drift(Years)
1y2y3y4y5y6y7y8y9y10y
At least 1 memento
At least 8 mementos
At least 64 mementos
At least 512 mementos
At least 4,096 mementos
At least 32,768 mementos
Drift(years)
Step Number Step Number
7/23/13 Scott G. Ainsworth • Michael L. Nelson
39
JointConferenceonDigitalLibraries(JCDL)2013
DRIFT BY CHOICE
7/23/13 Scott G. Ainsworth • Michael L. Nelson
40
Choice
MeanDrift(months)
● Sliding
● Sticky
JointConferenceonDigitalLibraries(JCDL)2013
DRIFT BY DOMAINS
7/23/13 Scott G. Ainsworth • Michael L. Nelson
41
Domain Count
MeanDrift(months) ● Sliding
● Sticky
JointConferenceonDigitalLibraries(JCDL)2013
CONTENTS
 Motivation
 Related work
 Measuring drift
 Results
 Future work & conclusions
7/23/13 Scott G. Ainsworth • Michael L. Nelson
42
JointConferenceonDigitalLibraries(JCDL)2013
FUTURE WORK
Integrate real-world walk patterns
• AlNoamany et al. – Internet Archive logs
• Domains users avoid – link farms, etc.
• Domain clusters
• Self referencing domains – 101celebrities.com
Check other archives
• Other archives now have Memento API
7/23/13 Scott G. Ainsworth • Michael L. Nelson
43
JointConferenceonDigitalLibraries(JCDL)2013
CONCLUSIONS
30 days less drift using Sticky policy.
Sticky policy controls drift;
Sliding policy does not.
7/23/13 Scott G. Ainsworth • Michael L. Nelson
44
JointConferenceonDigitalLibraries(JCDL)2013
BACKUP
7/23/13 Scott G. Ainsworth • Michael L. Nelson
45
JointConferenceonDigitalLibraries(JCDL)2013
WALK LENGTHS
Walk Length DMOZ S.Eng. Delicious Bitly Total
1 5,355 1,239 7,139 1,289 15,076
2 3,571 924 4,857 817 10,169
3 1,891 598 3,311 623 6,423
4 1,212 381 2,228 415 4,236
5 791 315 1,588 314 3,008
6 583 232 1,168 259 2,242
7 417 178 877 186 1,658
8 258 153 651 136 1,198
9 187 111 498 108 904
10 144 79 337 79 679
…
20 14 10 36 9 76
…
41-45 6 2 14 2 24
46-50 6 3 6 1 16
7/23/13 Scott G. Ainsworth • Michael L. Nelson
46
JointConferenceonDigitalLibraries(JCDL)2013
MEAN DRIFT BY STEP
7/23/13 Scott G. Ainsworth • Michael L. Nelson
47
Step Number
MeanDrift(months)
Mean Drift by Step
Step Number
MeanDrift(Months)
1 10 20 30 40 50
01m2m3m4m5m6m7m API
UI
●
●
●●● ●
●●●● ●●●● ●●
●
●
●
●
●
●●●
●
●●
●
●
●●
●
●●
●
●
● ●
●
●
●
●
● ●
● ●
● ●
●
●
●
●
●
●
●
●
●●●●
●●●
● ●●
●●
●●●
●
●●
●
●
● ●
●
●●
●●
●●
●
●
●● ●● ●●
●
● ●● ●●
●
● Sliding
● Sticky
● μ ○ σ
JointConferenceonDigitalLibraries(JCDL)2013
SLIDING TARGET
⟹ GET …/20050514013608/http://www.cs.odu.edu/ HTTP/1.1
⟸ HTTP/1.1 200 OKAY
⟹ GET …/20050514013608/http://sci.odu.edu/ HTTP/1.1
⟸ HTTP/1.1 302 FOUND
Location: …/20050522001752/http://sci.odu.edu/
⟹ GET …/20050522001752/http://sci.odu.edu/ HTTP/1.1
⟸ HTTP/1.1 200 OKAY
⟹ GET …/20050522001752/http://www.cs.odu.edu/ HTTP/1.1
⟸ HTTP/1.1 302 FOUND
Location: …/20050331091610/http://www.cs.odu.edu/
⟹ GET …/20050331091610/http://www.cs.odu.edu/ HTTP/1.1
⟸ HTTP/1.1 200 OKAY
7/23/13 Scott G. Ainsworth • Michael L. Nelson
48
JointConferenceonDigitalLibraries(JCDL)2013
SLIDING TARGET
⟹ GET …/20050514013608/http://www.cs.odu.edu/ HTTP/1.1
⟸ HTTP/1.1 200 OKAY
⟹ GET …/20050514013608/http://sci.odu.edu/ HTTP/1.1
⟸ HTTP/1.1 302 FOUND
Location: …/20050522001752/http://sci.odu.edu/
⟹ GET …/20050522001752/http://sci.odu.edu/ HTTP/1.1
⟸ HTTP/1.1 200 OKAY
⟹ GET …/20050522001752/http://www.cs.odu.edu/ HTTP/1.1
⟸ HTTP/1.1 302 FOUND
Location: …/20050331091610/http://www.cs.odu.edu/
⟹ GET …/20050331091610/http://www.cs.odu.edu/ HTTP/1.1
⟸ HTTP/1.1 200 OKAY
7/23/13 Scott G. Ainsworth • Michael L. Nelson
49
22 Days
44 Days
JointConferenceonDigitalLibraries(JCDL)2013
STICKY TARGET
⟹ GET <timegate>/http://www.cs.odu.edu/ HTTP/1.1
Accept-Datetime: Sat, 10 May 2005 11:21:00 GMT
⟸ HTTP/1.1 302 FOUND
Location: …/20050514013608/http://www.cs.odu.edu/
⟹ GET …/20050514013608/http://www.cs.odu.edu/ HTTP/1.1
⟸ HTTP/1.1 200 OKAY
⟹ GET <timegate>/http://sci.odu.edu/ HTTP/1.1
Accept-Datetime: Sat, 10 May 2005 11:21:00 GMT
⟸ HTTP/1.1 302 FOUND
Location: …/20050522001752/http://sci.odu.edu/
⟹ GET …/20050522001752/http://sci.odu.edu/ HTTP/1.1
⟸ HTTP/1.1 200 OKAY
⟹ GET <timegate>/http://www.cs.odu.edu/ HTTP/1.1
Accept-Datetime: Sat, 10 May 2005 11:21:00 GMT
⟸ HTTP/1.1 302 FOUND
Location: …/20050514013608/http://www.cs.odu.edu/
⟹ GET …/20050514013608/http://www.cs.odu.edu/ HTTP/1.1
⟸ HTTP/1.1 200 OKAY
7/23/13 Scott G. Ainsworth • Michael L. Nelson
50
JointConferenceonDigitalLibraries(JCDL)2013
STICKY TARGET (MEMENTO)
⟹ GET <timegate>/http://www.cs.odu.edu/ HTTP/1.1
Accept-Datetime: Sat, 10 May 2005 11:21:00 GMT
⟸ HTTP/1.1 302 FOUND
Location: …/20050514013608/http://www.cs.odu.edu/
⟹ GET …/20050514013608/http://www.cs.odu.edu/ HTTP/1.1
⟸ HTTP/1.1 200 OKAY
⟹ GET <timegate>/http://sci.odu.edu/ HTTP/1.1
Accept-Datetime: Sat, 10 May 2005 11:21:00 GMT
⟸ HTTP/1.1 302 FOUND
Location: …/20050522001752/http://sci.odu.edu/
⟹ GET …/20050522001752/http://sci.odu.edu/ HTTP/1.1
⟸ HTTP/1.1 200 OKAY
⟹ GET <timegate>/http://www.cs.odu.edu/ HTTP/1.1
Accept-Datetime: Sat, 10 May 2005 11:21:00 GMT
⟸ HTTP/1.1 302 FOUND
Location: …/20050514013608/http://www.cs.odu.edu/
⟹ GET …/20050514013608/http://www.cs.odu.edu/ HTTP/1.1
⟸ HTTP/1.1 200 OKAY
7/23/13 Scott G. Ainsworth • Michael L. Nelson
51
22 Days
0 Days
JointConferenceonDigitalLibraries(JCDL)2013
TWO BROWSING POLICIES
SLIDING TARGET
Target
• Resource datetime
Drift types
• Memento drift
• Target drift
STICKY TARGET
Target
• Original datetime
Drift type
• Only memento drift
7/23/13 Scott G. Ainsworth • Michael L. Nelson
52
JointConferenceonDigitalLibraries(JCDL)2013
TWO TYPES OF DRIFT
Target Drift
• Drift introduced by changing the target datetime
• | received-datetime – original-datetime |
Memento Drift
• Drift introduced by not having the exact datetime
requested available.
• | received-datetime – requested-datetime |
7/23/13 Scott G. Ainsworth • Michael L. Nelson
53

More Related Content

Similar to Evaluating Sliding and Sticky Target Policies by Measuring Temporal Drift in Acyclic Walks Through a Web Archive�

Interactive Climate Change Scenario Planning using CommunityViz and other Low...
Interactive Climate Change Scenario Planning using CommunityViz and other Low...Interactive Climate Change Scenario Planning using CommunityViz and other Low...
Interactive Climate Change Scenario Planning using CommunityViz and other Low...Jason Lally
 
Session #7 - Pedestrian & Bicycle Counting Tips - Schneider
Session #7 - Pedestrian & Bicycle Counting Tips - SchneiderSession #7 - Pedestrian & Bicycle Counting Tips - Schneider
Session #7 - Pedestrian & Bicycle Counting Tips - SchneiderSharon Roerty
 
Improve the communication between an expert and a layman through interactive ...
Improve the communication between an expert and a layman through interactive ...Improve the communication between an expert and a layman through interactive ...
Improve the communication between an expert and a layman through interactive ...
KU Leuven
 
ATS-16: Making Data Count, Krista Nordback
ATS-16: Making Data Count, Krista NordbackATS-16: Making Data Count, Krista Nordback
ATS-16: Making Data Count, Krista Nordback
BTAOregon
 
Pitfalls in alignment of observation models resolved using PROV as an upper o...
Pitfalls in alignment of observation models resolved using PROV as an upper o...Pitfalls in alignment of observation models resolved using PROV as an upper o...
Pitfalls in alignment of observation models resolved using PROV as an upper o...
Simon Cox
 
TrendsEarth_Session2_final.pdf
TrendsEarth_Session2_final.pdfTrendsEarth_Session2_final.pdf
TrendsEarth_Session2_final.pdf
eriStiyanto2
 
XLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaXLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and Myria
University of Washington
 
Make public participation great again
Make public participation great againMake public participation great again
Make public participation great again
Southern New England American Planning Association
 
Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...
jins0618
 
FOGSS_2023_Welcome.pdf
FOGSS_2023_Welcome.pdfFOGSS_2023_Welcome.pdf
FOGSS_2023_Welcome.pdf
FOGSSCommittee
 
Data Visualization for Drought & Cross Border Crisis
Data Visualization for Drought & Cross Border CrisisData Visualization for Drought & Cross Border Crisis
Data Visualization for Drought & Cross Border CrisisAngela Zoss
 
2018 GIS in Education: Denver Street Trees and Road Safety a Geographic Infor...
2018 GIS in Education: Denver Street Trees and Road Safety a Geographic Infor...2018 GIS in Education: Denver Street Trees and Road Safety a Geographic Infor...
2018 GIS in Education: Denver Street Trees and Road Safety a Geographic Infor...
GIS in the Rockies
 
METRO Conference Presentation Jan 2015
METRO Conference Presentation Jan 2015METRO Conference Presentation Jan 2015
METRO Conference Presentation Jan 2015Victoria Steeves
 
Forum presentation #3 (cve5110 c) mlk speed harmonization
Forum presentation #3 (cve5110 c) mlk speed harmonizationForum presentation #3 (cve5110 c) mlk speed harmonization
Forum presentation #3 (cve5110 c) mlk speed harmonization
Nathan Baker
 
PEARC17: Visual exploration and analysis of time series earthquake data
PEARC17: Visual exploration and analysis of time series earthquake dataPEARC17: Visual exploration and analysis of time series earthquake data
PEARC17: Visual exploration and analysis of time series earthquake data
Amit Chourasia
 
Establishing best practices to improve usefulness and usability of web interf...
Establishing best practices to improve usefulness and usability of web interf...Establishing best practices to improve usefulness and usability of web interf...
Establishing best practices to improve usefulness and usability of web interf...
DRIscience
 
AHM 2014: Crawling for EarthCube
AHM 2014: Crawling for EarthCubeAHM 2014: Crawling for EarthCube
AHM 2014: Crawling for EarthCube
EarthCube
 
Assessing glacier retreat and water resource vulnerability at volcán Chimbora...
Assessing glacier retreat and water resource vulnerability at volcán Chimbora...Assessing glacier retreat and water resource vulnerability at volcán Chimbora...
Assessing glacier retreat and water resource vulnerability at volcán Chimbora...glacierchangeosu
 

Similar to Evaluating Sliding and Sticky Target Policies by Measuring Temporal Drift in Acyclic Walks Through a Web Archive� (20)

Interactive Climate Change Scenario Planning using CommunityViz and other Low...
Interactive Climate Change Scenario Planning using CommunityViz and other Low...Interactive Climate Change Scenario Planning using CommunityViz and other Low...
Interactive Climate Change Scenario Planning using CommunityViz and other Low...
 
Session #7 - Pedestrian & Bicycle Counting Tips - Schneider
Session #7 - Pedestrian & Bicycle Counting Tips - SchneiderSession #7 - Pedestrian & Bicycle Counting Tips - Schneider
Session #7 - Pedestrian & Bicycle Counting Tips - Schneider
 
Improve the communication between an expert and a layman through interactive ...
Improve the communication between an expert and a layman through interactive ...Improve the communication between an expert and a layman through interactive ...
Improve the communication between an expert and a layman through interactive ...
 
ATS-16: Making Data Count, Krista Nordback
ATS-16: Making Data Count, Krista NordbackATS-16: Making Data Count, Krista Nordback
ATS-16: Making Data Count, Krista Nordback
 
Pitfalls in alignment of observation models resolved using PROV as an upper o...
Pitfalls in alignment of observation models resolved using PROV as an upper o...Pitfalls in alignment of observation models resolved using PROV as an upper o...
Pitfalls in alignment of observation models resolved using PROV as an upper o...
 
TrendsEarth_Session2_final.pdf
TrendsEarth_Session2_final.pdfTrendsEarth_Session2_final.pdf
TrendsEarth_Session2_final.pdf
 
XLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaXLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and Myria
 
Make public participation great again
Make public participation great againMake public participation great again
Make public participation great again
 
NISO Bibliographic Development Roadmap - Project Discussion Webinar, Phase 2
NISO Bibliographic Development Roadmap - Project Discussion Webinar, Phase 2NISO Bibliographic Development Roadmap - Project Discussion Webinar, Phase 2
NISO Bibliographic Development Roadmap - Project Discussion Webinar, Phase 2
 
FINAL PAPER
FINAL PAPERFINAL PAPER
FINAL PAPER
 
Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...Gao cong geospatial social media data management and context-aware recommenda...
Gao cong geospatial social media data management and context-aware recommenda...
 
FOGSS_2023_Welcome.pdf
FOGSS_2023_Welcome.pdfFOGSS_2023_Welcome.pdf
FOGSS_2023_Welcome.pdf
 
Data Visualization for Drought & Cross Border Crisis
Data Visualization for Drought & Cross Border CrisisData Visualization for Drought & Cross Border Crisis
Data Visualization for Drought & Cross Border Crisis
 
2018 GIS in Education: Denver Street Trees and Road Safety a Geographic Infor...
2018 GIS in Education: Denver Street Trees and Road Safety a Geographic Infor...2018 GIS in Education: Denver Street Trees and Road Safety a Geographic Infor...
2018 GIS in Education: Denver Street Trees and Road Safety a Geographic Infor...
 
METRO Conference Presentation Jan 2015
METRO Conference Presentation Jan 2015METRO Conference Presentation Jan 2015
METRO Conference Presentation Jan 2015
 
Forum presentation #3 (cve5110 c) mlk speed harmonization
Forum presentation #3 (cve5110 c) mlk speed harmonizationForum presentation #3 (cve5110 c) mlk speed harmonization
Forum presentation #3 (cve5110 c) mlk speed harmonization
 
PEARC17: Visual exploration and analysis of time series earthquake data
PEARC17: Visual exploration and analysis of time series earthquake dataPEARC17: Visual exploration and analysis of time series earthquake data
PEARC17: Visual exploration and analysis of time series earthquake data
 
Establishing best practices to improve usefulness and usability of web interf...
Establishing best practices to improve usefulness and usability of web interf...Establishing best practices to improve usefulness and usability of web interf...
Establishing best practices to improve usefulness and usability of web interf...
 
AHM 2014: Crawling for EarthCube
AHM 2014: Crawling for EarthCubeAHM 2014: Crawling for EarthCube
AHM 2014: Crawling for EarthCube
 
Assessing glacier retreat and water resource vulnerability at volcán Chimbora...
Assessing glacier retreat and water resource vulnerability at volcán Chimbora...Assessing glacier retreat and water resource vulnerability at volcán Chimbora...
Assessing glacier retreat and water resource vulnerability at volcán Chimbora...
 

Recently uploaded

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 

Recently uploaded (20)

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 

Evaluating Sliding and Sticky Target Policies by Measuring Temporal Drift in Acyclic Walks Through a Web Archive�

Editor's Notes

  1. Please forgive the long title. Let me explain it with a fable…
  2. A student at ODU becomes curious about the history of the Computer Science Department and visits the Internet Archive’s Wayback Machine.
  3. The student enters http://www.cs.odu.edu and is shown the available dates.The student navigates to2005 and selects 14 May @ 01:36:08.
  4. The student review the Computer Science page.Finding the College of Scienceslink interesting link, the student clicks on it.
  5. After reviewing the College of Sciences page, the student returns to the Computer Science page, and…
  6. 1. Whoa! That’s not what was expected!
  7. What just happened.We expected the left side, but got the right side.This is a result of the applying the Sliding Target Policy.Highlight the temporal drift.
  8. This is an example of the “Sliding Target Policy.”Here is how it works:We started on the May 14 page we selected.When The College of Sciences was clicked,May 14 was used as the target.
  9. And, April 22 was nearest Memento (archived version).When The Computer Science was clicked,April 22 was used as the target.
  10. And, March 31 was nearest Memento.
  11. “What if the target datetime is held steady instead of being allowed to drift?”The Memento extension to HTTP enables this.
  12. This is a very abbreviated introduction to the Memento API.The Memento API allows an HTTP client to negotiate a datetime.On request, the client add the Accept-Datetime header.On reply, the server sends the Memento-Datetime header, indicating the actual datetime of the memento returned.Memento-Datetime is generally the acquisition datetime of the archived copy.
  13. Sticky target can be accomplished using the MementoFox extension to Firefox.MementoFox allows the datetime desired is entered and remain fixed.(CLICK)The nearest Memento is retrieved.(CLICK)In this case, the May 14 Computer Science page—same as we selected using the Wayback Machine UI.When the College of Sciences is clicked…(CLICK)
  14. The April 22 page is shown again, because the target datetime is still 2005-05-14.So it is still the nearest.(CLICK)When Computer Science is clicked again…
  15. May 15 is shown as expected.(PAUSE)
  16. Here is a quick comparison:Review Sticky drift is 1/3 of Sliding
  17. This leads to questions:How much temporal drift can be expected?How much improvement can Sticky provide (assuming it is the policy needed)?Does Sticky always produce less drift.
  18. The rest of this presentation will take the following form:A brief discussion of related work and how this research improves our knowledge.Describe how we measured drift?A review of the results.A quick look at how this work can be refined.
  19. The majority of work to date has focused on improving the quality of data acquisition.Spaniol et al. focused on strategy.Denev et a. looked at change rate by MIME type.Ben Saad et al. crawl metadata used to improve presentation to the user.Our focus is getting the best results from existing collectionsAfter all, we can’t go back and “fix” past data acquisition.
  20. Let start with a few definitions.Walk length is the number of successful steps; step with HTTP 200 responses for both the timemap and memento.Choice is the sum of the number of unique links at each walk step.Unique domains is the number of domains seen during the walk. This is domains such as jcdl.org or amazon.com. Independent sites within domains were not segragated (e.g. wordpress.com is a single domain).Drift is the magnitude of the difference between the initial target datetime and Memento-Datetime.
  21. Let us return to our fable starting with the selection of the first memento.The first step of the process is selecting a URI.
  22. Next the URI’s timemap is downloaded.Timemaps are a computer-readable form the the calendar page.(CLICK)This is a partial timemap for www.cs.odu.edu.Once we have the timemap, a memento is randomly selected.(CLICK)This is the entry for ODU CS Home on May 14, 2005 02:48:46.
  23. Next both mementos (Wayback Machine and Memento API) are downloaded.
  24. And common links are determined.This completes the first iteration of the process.Let look at the statistics so far.
  25. So far we have1 successful step1 unique domain (odu.edu)42 links (choice)And no drift. (But note that drift greater than 0 is not always the case on the first step.)
  26. To start the next iteration, a link is randomly selected.
  27. Subsequent iterations are similar to the first.The only difference is that since the target datetime could have drifted on the Wayback machine side, it is possible that two different mementos are selected.
  28. From the College of Sciences, we go to the ODU home page.This adds a successful step,But does not add a new domain.It also adds 36 additional links.Note the missing image. This is quite common but does not change drift calculations.
  29. This is an example of an acquisition-time redirect.
  30. In this case, www.odusports.com redirected to odusports.collegesports.com, which is probably a service provider.
  31. The ODU Sports page has a link to vtext.com, probably because Verizon was a sponsor.
  32. Finally, clicking on “Get It Now” stops the walk with a 404.
  33. Walks stop for many reasons.The main reasons are:(CLICK ON EACH)403: Access not allowed404: Not archived503: Not currently availableNot HTML (no links)No common links (divergent versions)
  34. OccurrencesExponential scale.Very few walks make it pas mid-20s.Mean DriftShows that stick is 45-60 nearer on average(CLICK)Counter to intuition that drift decreases over timeAnd standard distribution is all over the place
  35. The data is variable enough that median is the best measure of central tendency.The main point of this graph is that the Sticky policy reigns in drift andThe sliding policy allows it to continue to increase.Notes:The initial up curve is due to choosing a known Memento-Datetime.We suspect the drop starting at steps 42+ is due to large, self-referencing sites (101celebrities.com) and clusters of related sites.
  36. Here is another look at the data.Again blue is the sliding policy and green sticky.Blacks and red are high density,Orange and red medium.Blues and greens low.An interesting note, even on the first step there is sometimes considerable drift.This happens when the archive redirects from one Memento-Datetime to another.Even though each of these graphs represents over 48K mementos, the sliding policy graph is more spread out because the drift is higher.But let’s focus on the highest density points, those with 64 or more Mementos.Here the increase drift is clearly visible in the increase height at nearly every step.
  37. Next is drift by choice.Choice, on the horizontal scale, is exponential.Choice is the total choice per walk, so the data clusters at the lower number because there are more shorter walks.The key here is that drift does increase with choice, but not by much.
  38. Number of domains on the other hand, has a dramatic effect on drift.Here, the horizontal access is the number of domains in a walk.The vertical access is the mean drift across all the walks with the same number of domains.Like walk length, the stick policy controls drift andThe sliding policy allows it to increase.
  39. OccurrencesExponential scale.Very few walks make it pas mid-20s.Mean DriftShows that stick is 45-60 nearer on average(CLICK)Counter to intuition that drift decreases over timeAnd standard distribution is all over the place