SlideShare a Scribd company logo
Not All Mementos are Created Equal: Measuring the Impact of Missing Resources 
Justin F. Brunelle, Mat Kelly, HanySalahEldeen, Michele C. Weigle, Michael L. Nelson 
Old Dominion University 
{jbrunelle, mkelly, hany, mweigle, mln}@cs.odu.edu 
1
Goal: Automatically measure the quality of the archives 
2 
20% missing
Goal: Automatically measure the quality of the archives 
3 
14% missing
Goal: Automatically measure the quality of the archives 
4 
28% missing
Goal: Automatically measure the quality of the archives 
5 
7% missing
“Live” XKCD 
•Missing 17% of embedded resources 
•Looks complete 
6
“Live” XKCD 
•Take three resources: 
•Logo 
•Main Comic 
•Navigation Strip 
•Relative importance? 
•All present in “Live” XKCD 
7
Damaging XKCD 
•Created a local memento 
•Removed the logo and navigation strip 
•Now missing 29% of embedded resources 
•Human assessment: looks OK 
8
Damaging XKCD 
•From our local memento 
•Removed the Main Comic 
•Now missing 24% of embedded resources 
•Human assessment: Not a usable memento 
9
Damaging XKCD 
•From our local memento 
•Removed the Main Comic 
•Now missing 24% of embedded resources 
•Human assessment: Not a usable memento 
•Percent of missing embedded resources is not a suitable metric for memento quality 
10
Image Importance 
•Size (as percentage of all pixels) 
11
Image Importance 
•Size 
•Position (in viewport?) 
12
Image Importance 
•Size 
•Position 
•Centrality (in the vertical or horizontal center?) 
13
Missing CSS 
•Damage not limited to images 
•When missing CSS, content shifts left 
14
Missing CSS 
•Partitioned snapshot into thirds 
•Background color determined 
•Pixel-by-pixel comparison 
15
Missing CSS 
•Calculated the amount of content in each vertical third 
•If >=80% in left column and missing CSS, CSS is important 
•Only performed if stylesheetsare missing 
16
Percent Missing vs. Weighted Damage 
•푀푀= Percent of embedded resources missing 
푀푀= 퐸푚푏푒푑푑푒푑푅푒푠표푢푟푐푒푠푀푖푠푠푖푛푔 푇표푡푎푙퐸푚푏푒푑푑푒푑푅푒푠표푢푟푐푒푠 
•퐷푀= Damage rating of missing embedded resources 
퐷푀= 퐷푀퐴푐푡푢푎푙 퐷푀푃표푡푒푛푡푖푎푙 
퐷푀푃표푡푒푛푡푖푎푙= 푖=1 푛[퐼|푀푀] 퐷[퐼|푀푀](푖) 푛[퐼|푀푀] + 푖=1 푛[퐶] 퐷[퐶](푖) 푛퐶 17 
퐼=퐼푚푎푔푒 
푀푀=푀푢푙푡푖푀푒푑푖푎 
퐶=퐶푆푆
Calculated Damage 
•푀푀= Percent of embedded resources missing 
푀푀= 퐸푚푏푒푑푑푒푑푅푒푠표푢푟푐푒푠푀푖푠푠푖푛푔 푇표푡푎푙퐸푚푏푒푑푑푒푑푅푒푠표푢푟푐푒푠 
•퐷푀= Damage rating of missing embedded resources 
퐷푀= 퐷푀퐴푐푡푢푎푙 퐷푀푃표푡푒푛푡푖푎푙 
퐷푀푃표푡푒푛푡푖푎푙= 푖=1 푛[퐼|푀푀] 퐷[퐼|푀푀](푖) 푛[퐼|푀푀] + 푖=1 푛[퐶] 퐷[퐶](푖) 푛퐶 18 
푀푀=0.29 
퐷푀=0.36 
푀푀=0.24 
퐷푀=0.41
What do Web users think? 
19
Setting up the Turk Test 
•Amazon’s mechanical turkersrepresent real web users 
•Two legs of the experiment: 
•Manually damaged memento vs. Live resource 
•10 manually damaged mementos and resources 
•Real Memento vs. Real Memento 
•100 URI-Rs, one memento per year 
20
21
22
23
Quantifying TurkerResponse 
•5 turkersfor each comparison 
•Assume 퐷퐴< 퐷퐵(i.e., A is less damaged) 
•Measure turkeragreement: 
Image A 
Image B 
Split 
Turker1 
Y 
Turker2 
Y 
Turker3 
Y 
Turker4 
Y 
Turker5 
Y 
Result 
5 
0 
5-0 
24
Quantifying TurkerResponse 
•5 turkersfor each comparison 
•Assume 퐷퐴< 퐷퐵(i.e., A is less damaged) 
•Measure turkeragreement: 
Image A 
Image B 
Split 
Turker1 
Y 
Turker2 
Y 
Turker3 
Y 
Turker4 
Y 
Turker5 
Y 
Result 
4 
1 
4-1 
25
Quantifying TurkerResponse 
•5 turkersfor each comparison 
•Assume 퐷퐴< 퐷퐵(i.e., A is less damaged) 
•Measure turkeragreement: 
Image A 
Image B 
Split 
Turker1 
Y 
Turker2 
Y 
Turker3 
Y 
Turker4 
Y 
Turker5 
Y 
Result 
0 
5 
0-5 
26
Quantifying TurkerResponse 
•5 turkersfor each comparison 
•Assume 퐷퐴< 퐷퐵(i.e., A is less damaged) 
•Measure turkeragreement: 
Image A 
Image B 
Split 
Turker1 
Y 
Turker2 
Y 
Turker3 
Y 
Turker4 
Y 
Turker5 
Y 
Result 
0 
5 
0-5 
27 
No agreement!
Quantifying TurkerResponse 
•5 turkersfor each comparison 
•Assume 퐷퐴< 퐷퐵(i.e., A is less damaged) 
•Measure turkeragreement: 
Image A 
Image B 
Split 
Turker1 
Y 
Turker2 
Y 
Turker3 
Y 
Turker4 
Y 
Turker5 
Y 
Result 
3 
2 
3-2 
28
Quantifying TurkerResponse 
•5 turkersfor each comparison 
•Assume 퐷퐴< 퐷퐵(i.e., A is less damaged) 
•Measure turkeragreement: Defined only by 4-1 and 5-0 splits 
Image A 
Image B 
Split 
Turker1 
Y 
Turker2 
Y 
Turker3 
Y 
Turker4 
Y 
Turker5 
Y 
Result 
3 
2 
3-2 
29 
Split decision  No agreement!
Turk Results 
•Compared damage(퐷푀) and percent missing (푀푀) 
•M0: Manually damaged mementos 
•D: Internet Archive Mementos 
•M: Percent missing in Internet Archive Mementos 
•퐷푀vs. Live: 78.9% true positives 
•푀푀vs. Live: 47.2% true positives 
•Worse than a 50/50chance! 
•퐷푀vs 퐷푀: 58.4% true positives 
30
Damage in the Internet Archive 
•1,000 URI-Rs from Bitly 
•1,000 URI-Rs from Archive-it 
•Remove non-HTML representations 
•1,861 URI-Rs remaining 
•Sample 1 memento per year from Internet Archive 
•Measure damage 
31
•Measured Internet Archive mementos 
•Damage generally improves over time 
•Despite missing more resources over time 
Damage in the Internet Archive 
32
Conclusions 
•퐷푀is a better measure of memento quality than 푀푀 
•On average, the Internet Archive is improving its quality over time 
•Internet Archive is also missing more embedded resources over time 
•Improved damage weighting (58.4% correct can be improved) 
•Measure cumulative temporal damage ratings 
•E.g., a logo that never changes for 10 years and is used by 100 mementos is more important than the one used in a single memento. 
33

More Related Content

More from Justin Brunelle

iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
Justin Brunelle
 
How I spend my summer vacations
How I spend my summer vacationsHow I spend my summer vacations
How I spend my summer vacations
Justin Brunelle
 
An Evaluation of Caching Policies for Memento TimeMaps
An Evaluation of Caching Policies for Memento TimeMapsAn Evaluation of Caching Policies for Memento TimeMaps
An Evaluation of Caching Policies for Memento TimeMaps
Justin Brunelle
 
Filling in the Blanks: Capturing Dynamically Generated Content
Filling in the Blanks: Capturing Dynamically Generated ContentFilling in the Blanks: Capturing Dynamically Generated Content
Filling in the Blanks: Capturing Dynamically Generated Content
Justin Brunelle
 
Day in the Life of a Computer Scientist
Day in the Life of a Computer ScientistDay in the Life of a Computer Scientist
Day in the Life of a Computer Scientist
Justin Brunelle
 
Agile Engineering - ODU ACM
Agile Engineering - ODU ACMAgile Engineering - ODU ACM
Agile Engineering - ODU ACM
Justin Brunelle
 
Records expo
Records expoRecords expo
Records expo
Justin Brunelle
 
Digital Preservation - ODU
Digital Preservation - ODUDigital Preservation - ODU
Digital Preservation - ODU
Justin Brunelle
 
Digital Preservation at ODU
Digital Preservation at ODUDigital Preservation at ODU
Digital Preservation at ODU
Justin Brunelle
 

More from Justin Brunelle (9)

iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
 
How I spend my summer vacations
How I spend my summer vacationsHow I spend my summer vacations
How I spend my summer vacations
 
An Evaluation of Caching Policies for Memento TimeMaps
An Evaluation of Caching Policies for Memento TimeMapsAn Evaluation of Caching Policies for Memento TimeMaps
An Evaluation of Caching Policies for Memento TimeMaps
 
Filling in the Blanks: Capturing Dynamically Generated Content
Filling in the Blanks: Capturing Dynamically Generated ContentFilling in the Blanks: Capturing Dynamically Generated Content
Filling in the Blanks: Capturing Dynamically Generated Content
 
Day in the Life of a Computer Scientist
Day in the Life of a Computer ScientistDay in the Life of a Computer Scientist
Day in the Life of a Computer Scientist
 
Agile Engineering - ODU ACM
Agile Engineering - ODU ACMAgile Engineering - ODU ACM
Agile Engineering - ODU ACM
 
Records expo
Records expoRecords expo
Records expo
 
Digital Preservation - ODU
Digital Preservation - ODUDigital Preservation - ODU
Digital Preservation - ODU
 
Digital Preservation at ODU
Digital Preservation at ODUDigital Preservation at ODU
Digital Preservation at ODU
 

Recently uploaded

The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
Leonel Morgado
 
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfMending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Selcen Ozturkcan
 
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDSJAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
Sérgio Sacani
 
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Sérgio Sacani
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
Carl Bergstrom
 
23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
RDhivya6
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
frank0071
 
cathode ray oscilloscope and its applications
cathode ray oscilloscope and its applicationscathode ray oscilloscope and its applications
cathode ray oscilloscope and its applications
sandertein
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
Sérgio Sacani
 
Alternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart AgricultureAlternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
Methods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdfMethods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdf
PirithiRaju
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
vluwdy49
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
PirithiRaju
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
Sciences of Europe
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
Areesha Ahmad
 
HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1
Shashank Shekhar Pandey
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
Frédéric Baudron
 

Recently uploaded (20)

The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
 
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfMending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
 
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDSJAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
 
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
 
23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
 
cathode ray oscilloscope and its applications
cathode ray oscilloscope and its applicationscathode ray oscilloscope and its applications
cathode ray oscilloscope and its applications
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
 
Alternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart AgricultureAlternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart Agriculture
 
Methods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdfMethods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdf
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
 
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
 
HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1HOW DO ORGANISMS REPRODUCE?reproduction part 1
HOW DO ORGANISMS REPRODUCE?reproduction part 1
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
 

Not All Mementos Are Created Equal: Measuring The Impact Of Missing Mementos

  • 1. Not All Mementos are Created Equal: Measuring the Impact of Missing Resources Justin F. Brunelle, Mat Kelly, HanySalahEldeen, Michele C. Weigle, Michael L. Nelson Old Dominion University {jbrunelle, mkelly, hany, mweigle, mln}@cs.odu.edu 1
  • 2. Goal: Automatically measure the quality of the archives 2 20% missing
  • 3. Goal: Automatically measure the quality of the archives 3 14% missing
  • 4. Goal: Automatically measure the quality of the archives 4 28% missing
  • 5. Goal: Automatically measure the quality of the archives 5 7% missing
  • 6. “Live” XKCD •Missing 17% of embedded resources •Looks complete 6
  • 7. “Live” XKCD •Take three resources: •Logo •Main Comic •Navigation Strip •Relative importance? •All present in “Live” XKCD 7
  • 8. Damaging XKCD •Created a local memento •Removed the logo and navigation strip •Now missing 29% of embedded resources •Human assessment: looks OK 8
  • 9. Damaging XKCD •From our local memento •Removed the Main Comic •Now missing 24% of embedded resources •Human assessment: Not a usable memento 9
  • 10. Damaging XKCD •From our local memento •Removed the Main Comic •Now missing 24% of embedded resources •Human assessment: Not a usable memento •Percent of missing embedded resources is not a suitable metric for memento quality 10
  • 11. Image Importance •Size (as percentage of all pixels) 11
  • 12. Image Importance •Size •Position (in viewport?) 12
  • 13. Image Importance •Size •Position •Centrality (in the vertical or horizontal center?) 13
  • 14. Missing CSS •Damage not limited to images •When missing CSS, content shifts left 14
  • 15. Missing CSS •Partitioned snapshot into thirds •Background color determined •Pixel-by-pixel comparison 15
  • 16. Missing CSS •Calculated the amount of content in each vertical third •If >=80% in left column and missing CSS, CSS is important •Only performed if stylesheetsare missing 16
  • 17. Percent Missing vs. Weighted Damage •푀푀= Percent of embedded resources missing 푀푀= 퐸푚푏푒푑푑푒푑푅푒푠표푢푟푐푒푠푀푖푠푠푖푛푔 푇표푡푎푙퐸푚푏푒푑푑푒푑푅푒푠표푢푟푐푒푠 •퐷푀= Damage rating of missing embedded resources 퐷푀= 퐷푀퐴푐푡푢푎푙 퐷푀푃표푡푒푛푡푖푎푙 퐷푀푃표푡푒푛푡푖푎푙= 푖=1 푛[퐼|푀푀] 퐷[퐼|푀푀](푖) 푛[퐼|푀푀] + 푖=1 푛[퐶] 퐷[퐶](푖) 푛퐶 17 퐼=퐼푚푎푔푒 푀푀=푀푢푙푡푖푀푒푑푖푎 퐶=퐶푆푆
  • 18. Calculated Damage •푀푀= Percent of embedded resources missing 푀푀= 퐸푚푏푒푑푑푒푑푅푒푠표푢푟푐푒푠푀푖푠푠푖푛푔 푇표푡푎푙퐸푚푏푒푑푑푒푑푅푒푠표푢푟푐푒푠 •퐷푀= Damage rating of missing embedded resources 퐷푀= 퐷푀퐴푐푡푢푎푙 퐷푀푃표푡푒푛푡푖푎푙 퐷푀푃표푡푒푛푡푖푎푙= 푖=1 푛[퐼|푀푀] 퐷[퐼|푀푀](푖) 푛[퐼|푀푀] + 푖=1 푛[퐶] 퐷[퐶](푖) 푛퐶 18 푀푀=0.29 퐷푀=0.36 푀푀=0.24 퐷푀=0.41
  • 19. What do Web users think? 19
  • 20. Setting up the Turk Test •Amazon’s mechanical turkersrepresent real web users •Two legs of the experiment: •Manually damaged memento vs. Live resource •10 manually damaged mementos and resources •Real Memento vs. Real Memento •100 URI-Rs, one memento per year 20
  • 21. 21
  • 22. 22
  • 23. 23
  • 24. Quantifying TurkerResponse •5 turkersfor each comparison •Assume 퐷퐴< 퐷퐵(i.e., A is less damaged) •Measure turkeragreement: Image A Image B Split Turker1 Y Turker2 Y Turker3 Y Turker4 Y Turker5 Y Result 5 0 5-0 24
  • 25. Quantifying TurkerResponse •5 turkersfor each comparison •Assume 퐷퐴< 퐷퐵(i.e., A is less damaged) •Measure turkeragreement: Image A Image B Split Turker1 Y Turker2 Y Turker3 Y Turker4 Y Turker5 Y Result 4 1 4-1 25
  • 26. Quantifying TurkerResponse •5 turkersfor each comparison •Assume 퐷퐴< 퐷퐵(i.e., A is less damaged) •Measure turkeragreement: Image A Image B Split Turker1 Y Turker2 Y Turker3 Y Turker4 Y Turker5 Y Result 0 5 0-5 26
  • 27. Quantifying TurkerResponse •5 turkersfor each comparison •Assume 퐷퐴< 퐷퐵(i.e., A is less damaged) •Measure turkeragreement: Image A Image B Split Turker1 Y Turker2 Y Turker3 Y Turker4 Y Turker5 Y Result 0 5 0-5 27 No agreement!
  • 28. Quantifying TurkerResponse •5 turkersfor each comparison •Assume 퐷퐴< 퐷퐵(i.e., A is less damaged) •Measure turkeragreement: Image A Image B Split Turker1 Y Turker2 Y Turker3 Y Turker4 Y Turker5 Y Result 3 2 3-2 28
  • 29. Quantifying TurkerResponse •5 turkersfor each comparison •Assume 퐷퐴< 퐷퐵(i.e., A is less damaged) •Measure turkeragreement: Defined only by 4-1 and 5-0 splits Image A Image B Split Turker1 Y Turker2 Y Turker3 Y Turker4 Y Turker5 Y Result 3 2 3-2 29 Split decision  No agreement!
  • 30. Turk Results •Compared damage(퐷푀) and percent missing (푀푀) •M0: Manually damaged mementos •D: Internet Archive Mementos •M: Percent missing in Internet Archive Mementos •퐷푀vs. Live: 78.9% true positives •푀푀vs. Live: 47.2% true positives •Worse than a 50/50chance! •퐷푀vs 퐷푀: 58.4% true positives 30
  • 31. Damage in the Internet Archive •1,000 URI-Rs from Bitly •1,000 URI-Rs from Archive-it •Remove non-HTML representations •1,861 URI-Rs remaining •Sample 1 memento per year from Internet Archive •Measure damage 31
  • 32. •Measured Internet Archive mementos •Damage generally improves over time •Despite missing more resources over time Damage in the Internet Archive 32
  • 33. Conclusions •퐷푀is a better measure of memento quality than 푀푀 •On average, the Internet Archive is improving its quality over time •Internet Archive is also missing more embedded resources over time •Improved damage weighting (58.4% correct can be improved) •Measure cumulative temporal damage ratings •E.g., a logo that never changes for 10 years and is used by 100 mementos is more important than the one used in a single memento. 33