0
Reconstructing the past with
MediaWiki:
Programmatic Issues and Solutions
Shawn M. Jones
sjone@cs.odu.edu
Old Dominion Uni...
Reconstructing the Past with the
Internet Archive
HTML
Images
JavaScript
CSS
Our goal: Temporal Coherence
Make the page lo...
Some Results from the Internet
Archive Are Lacking
Images change between the time
the Archive crawls the main page
and the...
MediaWiki Shouldn’t Have This
Problem
HTML Images
JavaScript
CSS
What we’re not doing
Interest in Reconstructing the Past
With MediaWiki
Simplified Memento Overview
Rules for Reconstructing the Past With
MediaWiki
Do not modify any existing MediaWiki
code!
Conform to
MediaWiki
coding st...
Reconstructing the Past
Articles
Templates
Embedded Images
Embedded JavaScript
Embedded CSS
Accessing Old Article Text
The oldid argument references a revision of a page
within MediaWiki's database
Merely visiting ...
Reconstructing the Past
Articles
Handled by
Memento MediaWiki Extension
Templates
Embedded Images
Embedded JavaScript
Embe...
Including the Right Template
This gives us:
$title - the Title object for the given page
$parser - the Parser object for t...
Reconstructing the Past
Articles
Handled by
Memento MediaWiki Extension
Templates
Handled by
Memento MediaWiki Extension
E...
But What About Images?
This Map is important to
understanding the
content of this article
This image is changed
as the art...
It’s the same map if we look at the
June 6, 2013 revision now
Users can't view this
embedded resource as
it looked on June...
What should have happened
This is the the map from
June, 2013 that should
have been displayed
This is the current map
The ...
We Tried To Solve This
Upon further inspection of the code in MediaWiki, the $time argument
from this function is never us...
We Just Solved This
Upon further inspection of the code in MediaWiki, the $file argument’s
getHistory() function can be us...
Reconstructing the Past
Articles
Handled by
Memento MediaWiki Extension
Templates
Handled by
Memento MediaWiki Extension
E...
What about CSS/JavaScript?
The present CSS of
this page conflicts
with the past
Template.
We Couldn’t Solve This
The data is present, but we could not find any way for an
extension to access or render it.
Recap on Reconstructing the Past
Articles
Handled by
Memento MediaWiki Extension
Templates
Handled by
Memento MediaWiki Ex...
Uniform solution
• RFC 7089, Memento, was designed to provide
uniform access to past versions of all resources
on the Web
...
Resources
• Memento Protocol: http://tools.ietf.org/html/rfc7089
• Memento Website: http://www.mementoweb.org/
• Memento M...
Backup Slides
Sample URI-R (Step 1) HTTP Response
HTTP/1.1 200 OK
Date: Sun, 25 May 2014 21:39:02 GMT
Server: Apache
X-Content-Type-Opti...
Sample URI-G (Step 2) HTTP Response
HTTP/1.1 302 Found
Date: Sun, 25 May 2014 21:43:08 GMT
Server: Apache
X-Content-Type-O...
Sample URI-M (Step 3) HTTP Response
HTTP/1.1 200 OK
Date: Sun, 25 May 2014 21:46:12 GMT
Server: Apache
X-Content-Type-Opti...
Upcoming SlideShare
Loading in...5
×

Reconstructing the past with media wiki

723

Published on

The Internet Archive attempts to reconstruct web pages via snapshots (Mementos) that are taken of pages at various points in time. Many pages change more frequently than the Internet Archive can capture them, meaning that some revisions of a given web page are lost forever. Mediawiki, however, has all past revisions of a given page, and also its associated external resources. This inspired the development of the Memento Mediawiki Extension as an improvement over the Internet Archive's "drive by" method of digital preservation where Mediawiki sites are involved.

While working on the Memento Mediawiki Extension, effort was put into reconstructing past revisions of each Wiki page. The existing software reconstructs the page text as per RFC 7089, but does not try to pull in past versions of images, JavaScript, CSS, and other external resources, because Mediawiki, as it exists, makes it difficult or impossible to load these resources at page generation time.

This curated talk will explore the problems of page reconstruction on the main web and detail the issues within the Mediawiki code that currently prevent and/or make it difficult to reconstruct the page in its totality as it looked at that revision.

Published in: Software
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
723
On Slideshare
0
From Embeds
0
Number of Embeds
28
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Reconstructing the past with media wiki"

  1. 1. Reconstructing the past with MediaWiki: Programmatic Issues and Solutions Shawn M. Jones sjone@cs.odu.edu Old Dominion University
  2. 2. Reconstructing the Past with the Internet Archive HTML Images JavaScript CSS Our goal: Temporal Coherence Make the page look as it looked at the time it was archived.
  3. 3. Some Results from the Internet Archive Are Lacking Images change between the time the Archive crawls the main page and the time it gets to the images Sometimes embedded images are missing when the Archive gets to them Sometimes the page is designed for a specific browser in mind Image from “A Framework for Evaluation of Composite Memento Temporal Coherence” by S. Ainsworth, M. L. Nelson, H. Van de Sompel. http://arxiv.org/abs/1402.0928
  4. 4. MediaWiki Shouldn’t Have This Problem HTML Images JavaScript CSS
  5. 5. What we’re not doing
  6. 6. Interest in Reconstructing the Past With MediaWiki
  7. 7. Simplified Memento Overview
  8. 8. Rules for Reconstructing the Past With MediaWiki Do not modify any existing MediaWiki code! Conform to MediaWiki coding standards And…
  9. 9. Reconstructing the Past Articles Templates Embedded Images Embedded JavaScript Embedded CSS
  10. 10. Accessing Old Article Text The oldid argument references a revision of a page within MediaWiki's database Merely visiting the URI with the oldid will give you the text content of the page as it existed at that revision
  11. 11. Reconstructing the Past Articles Handled by Memento MediaWiki Extension Templates Embedded Images Embedded JavaScript Embedded CSS
  12. 12. Including the Right Template This gives us: $title - the Title object for the given page $parser - the Parser object for the given page $id - the revision ID (oldid) for the Template page Using $parser, and $title, we can change the $id and fetch an old revision of the Template
  13. 13. Reconstructing the Past Articles Handled by Memento MediaWiki Extension Templates Handled by Memento MediaWiki Extension Embedded Images Embedded JavaScript Embedded CSS
  14. 14. But What About Images? This Map is important to understanding the content of this article This image is changed as the article is changed, to reflect its content
  15. 15. It’s the same map if we look at the June 6, 2013 revision now Users can't view this embedded resource as it looked on June 2013 while reading the article from that time period
  16. 16. What should have happened This is the the map from June, 2013 that should have been displayed This is the current map The content of the article won't match the data in this visual aide, possibly confusing a user who wanted historical information on this topic
  17. 17. We Tried To Solve This Upon further inspection of the code in MediaWiki, the $time argument from this function is never used as detailed here
  18. 18. We Just Solved This Upon further inspection of the code in MediaWiki, the $file argument’s getHistory() function can be used to acquire previous revisions of images
  19. 19. Reconstructing the Past Articles Handled by Memento MediaWiki Extension Templates Handled by Memento MediaWiki Extension Embedded Images Prototyped for future version of Memento MediaWiki Extension Embedded JavaScript Embedded CSS
  20. 20. What about CSS/JavaScript? The present CSS of this page conflicts with the past Template.
  21. 21. We Couldn’t Solve This The data is present, but we could not find any way for an extension to access or render it.
  22. 22. Recap on Reconstructing the Past Articles Handled by Memento MediaWiki Extension Templates Handled by Memento MediaWiki Extension Embedded Images Prototyped for future version of Memento MediaWiki Extension Embedded JavaScript Requires changes to MediaWiki Embedded CSS Requires changes to MediaWiki
  23. 23. Uniform solution • RFC 7089, Memento, was designed to provide uniform access to past versions of all resources on the Web • Memento provides a web standard to access these resources
  24. 24. Resources • Memento Protocol: http://tools.ietf.org/html/rfc7089 • Memento Website: http://www.mementoweb.org/ • Memento MediaWiki Extension: http://www.mediawiki.org/wiki/Extension:Memento • Memento Chrome Extension: http://bit.ly/memento-for-chrome • More details: http://ws-dl.blogspot.com/2014/04/2014-04-01- yesterdays-wiki-page-todays.html • Contact me: sjone@cs.odu.edu
  25. 25. Backup Slides
  26. 26. Sample URI-R (Step 1) HTTP Response HTTP/1.1 200 OK Date: Sun, 25 May 2014 21:39:02 GMT Server: Apache X-Content-Type-Options: nosniff Link: http://ws-dl-05.cs.odu.edu/demo/index.php/Daenerys_Targaryen; rel="original latest-version", http://ws-dl- 05.cs.odu.edu/demo/index.php/Special:TimeGate/Daenerys_Targaryen; rel="timegate", http://ws-dl- 05.cs.odu.edu/demo/index.php/Special:TimeMap/Daenerys_Targaryen; rel="timemap”; type="application/link-format” Content-language: en Vary: Accept-Encoding,Cookie Cache-Control: s-maxage=18000, must-revalidate, max-age=0 Last-Modified: Sat, 17 May 2014 16:48:28 GMT Connection: close Content-Type: text/html; charset=UTF-8
  27. 27. Sample URI-G (Step 2) HTTP Response HTTP/1.1 302 Found Date: Sun, 25 May 2014 21:43:08 GMT Server: Apache X-Content-Type-Options: nosniff Vary: Accept-Encoding, Accept-Datetime Location: http://ws-dl- 05.cs.odu.edu/demo/index.php?title=Daenerys_Targaryen&oldid=1499 Link: <http://ws-dl- 05.cs.odu.edu/demo/index.php/Special:TimeMap/Daenerys_Targaryen>; rel="timemap”; type="application/link-format", <http://ws-dl-05.cs.odu.edu/demo/index.php/Daenerys_Targaryen>; rel="original latest-version” Connection: close Content-Type: text/html; charset=UTF-8
  28. 28. Sample URI-M (Step 3) HTTP Response HTTP/1.1 200 OK Date: Sun, 25 May 2014 21:46:12 GMT Server: Apache X-Content-Type-Options: nosniff Memento-Datetime: Sun, 22 Apr 2007 15:01:20 GMT Link: <http://ws-dl-05.cs.odu.edu/demo/index.php/Daenerys_Targaryen>; rel="original latest-version”, <http://ws-dl- 05.cs.odu.edu/demo/index.php/Special:TimeGate/Daenerys_Targaryen>; rel="timegate”, <http://ws-dl- 05.cs.odu.edu/demo/index.php/Special:TimeMap/Daenerys_Targaryen>; rel="timemap”; type="application/link-format” Content-language: en Vary: Accept-Encoding,Cookie Expires: Thu, 01 Jan 1970 00:00:00 GMT Cache-Control: private, must-revalidate, max-age=0 Connection: close Content-Type: text/html; charset=UTF-8
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×