SlideShare a Scribd company logo
1 of 197
Avoiding Spoilers On MediaWiki
Fan Sites Using Memento
By
Shawn M. Jones
sjone@cs.odu.edu
Committee:
Dr. Michael L. Nelson
Dr. Michele C. Weigle
Dr. Irwin B. Levinstein
Warning: This presentation may contain spoilers
Motivation 2
It Started With A Discussion At Work
About This Guy From Game Of Thrones
Motivation 3
So We Use Fan Wikis, Because They
Are Useful For Our Discussions
Motivation 4
http://gameofthrones.wikia.com/wiki/Joffrey_Baratheon
But The Current Page For This
Character Contains Spoilers
Motivation 5
But The Current Page For This
Character Contains Spoilers
Motivation 6
But The Current Page For This
Character Contains Spoilers
Motivation 7
But The Current Page For This
Character Contains Spoilers
Motivation 8
But The Current Page For This
Character Contains Spoilers
Motivation 9
We All Enjoy Some Episodic Fiction
So Much…
Motivation 10
…That Fans Have Created Wikis…
Motivation 11
…And The Rest Of Us Read Them
Motivation 12
So, What If We Could Avoid The
Spoilers By Using Past Wiki Pages?
Motivation 13
http://gameofthrones.wikia.com/wiki/Joffrey_Baratheon?oldid=125053
Order Of Discussion
• Background
• Related Work
• TimeGate Heuristics
• Theory Of Spoiler Probability
• Measurements Of Spoiler Probability
• Spoilers In The Wayback Machine
• The Memento MediaWiki Extension
• Future Work
• Conclusions
14
BACKGROUND
Building To The Naïve Spoiler Concept
15
• Background
• Related Work
• TimeGate Heuristics
• Theory of Spoiler Probability
• Measurements of Spoiler Probability
• Spoilers In The Wayback Machine
• The Memento MediaWiki Extension
• Future Work
• Conclusions
Most Of Us Are Familiar With
The Web Browser And HTML
Background 16
The Web Browser is how we
view web pages
HTML is what the
browser parses to
render the page
Hypertext Transfer Protocol (HTTP)
Is What Delivers Pages To The Browser
Background 17
Here Is An Example HTTP Request
From Google Chrome
Background 18
And The Interesting Parts Of Our
Request…
Background 19
Get me /wiki/The_Hunger_Games from
en.wikipiedia.org
Here Is An Example HTTP Response
From Wikipedia’s Server For That Page
Background 20
And The Interesting Parts Of Our
Response…
Background 21
OK, I Have What You
Are Looking For
This Is How Big It Is
This Is When It Was
Last Changed
Newline Indicating
Start Of HTML
HTML Starts Here And You Know How Big It Is,
So Stop Reading When You Get 13769 Bytes
Introducing Web Architecture:
Resources Can Have Many Representations
Background 22
In 1996 Tim Berners-Lee Discussed
Different Dimensions Of Representation
http://www.w3.org/DesignIssues/Generic.html
Background 23
These Representations Are Requested
And Identified By HTTP Headers
Request Header Response Header Dimension
Accept Content-Type Content type of Representation
Accept-Language Content-Language Language of Representation
Accept-Encoding Content-Encoding Medium of Representation
Accept-Charset Content-Type Character Set of Representation
Background 24
These Headers Were Specified In RFC 2616
Their Updated Specification Is In RFC 7231
These Representations Are Requested
And Identified By HTTP Headers
Request Header Response Header Dimension
Accept Content-Type Content type of Representation
Accept-Language Content-Language Language of Representation
Accept-Encoding Content-Encoding Medium of Representation
Accept-Charset Content-Type Character Set of Representation
Background 25
The term for this is Content Negotiation
It works so well that many are unaware of its ubiquity
Content Negotiation Does Not Create Representations,
It Only Directs The User To Ones That Already Exist…
Using Our Headers Example,
Chrome Tells The Server What You Want
Background 26
I prefer image/webp, and
I want it compressed with gzip, and
I want it in US English
Using Our Headers Example,
Wikipedia Tells Chrome What It Returns
Background 27
My response is compressed with gzip, and
My response is in US English
This response is in text/html with
a character set of UTF-8
Memento Finally Completes The Set
By Including Time
Request Header Response Header Dimension
Accept Content-Type Content type of Representation
Accept-Language Content-Language Language of Representation
Accept-Encoding Content-Encoding Medium of Representation
Accept-Charset Content-Type Character Set of Representation
Accept-Datetime Memento-Datetime Time of the Representation
Background 28
RFC 7089 Defines Memento, Allowing Us To Negotiate In Time
Memento Works In Several Steps
Background 29
Current Page
Archived Page
From The Past
Resource That
Redirects To
Archived Pages
At First The Client Hopes To Get A
TimeGate URI From the Server, But…
Background 30
Current Page
Archived Page
From The Past
Resource That
Redirects To
Archived Pages
Most Servers Do Not
Yet Return This Data
So Memento Clients
Default To Known
TimeGates
Then, A Browser Asks A TimeGate For A
Specific Page From A Specific Datetime
Background 31
I Want This Page On This Date
And The TimeGate Tells The Browser
Where It Can Go
Background 32
Go To This URI To Get The Best
Memento For The Date You Requested
Do Not Stop Here, I Am
Redirecting You Elsewhere
Then The Browser Gets A Memento
Like Any Other Web Page
Background 33
This Is The Datetime Of This Memento
Here’s Everything I Know About
Archives For That Page
Memento Also Provides TimeMaps As A
Machine-Readable List Of Mementos
Background 34
Memento For Chrome
Lets Users “Right-Click Into The Past”
Background 35
http://ws-dl.blogspot.com/2013/10/2013-10-14-right-click-to-past-memento.html
Mink Lets Users See All Mementos For
A Page
Background 36
http://ws-dl.blogspot.com/2014/10/2014-10-03-integrating-live-and.html
Who Maintains Mementos?
Background 37
How Are Mementos Acquired?
Background 38
The Internet Archive Provides
The Wayback Machine
Background 39
http://archive.org/web/
Using The Wayback Machine
One Can Select A URI And Datetime…
Background 40
http://web.archive.org/web/*/http://lostpedia.wikia.com
…And See The Memento From The
Internet Archive
Background 41
http://web.archive.org/web/20140214065217/http://lostpedia.wikia.com/wiki/Main_Page
The Internet Archive Contains
Mementos For Wiki Pages
Background 42
http://web.archive.org/web/20040204011112/http://en.wikipedia.org/wiki/Abraham_Lincoln
Here Is An Example Wiki Article…
Background 43
http://en.wikipedia.org/wiki/Abraham_Lincoln
Wikis Are Edited By Multiple Authors,
Resulting In Many Revisions…
Background 44
http://en.wikipedia.org/w/index.php?title=Abraham_Lincoln&action=history
Most Interesting: Wikis Keep
EVERY REVISION OF A PAGE!!!
Background 45
http://en.wikipedia.org/w/index.php?title=Abraham_Lincoln&action=history
With Every Revision Preserved,
One Can Visit Old Revisions Of Pages
Background 46
http://en.wikipedia.org/w/index.php?title=Abraham_Lincoln&oldid=345783631
For A Wiki, Every Web Archive Memento
Can Be Tied To A Wiki Revision
Background 47
Background 48
We Also Know Which
Revisions Were Missed…
For A Wiki, Every Web Archive Memento
Can Be Tied To A Wiki Revision
The Web Archives Are Sampling The Wiki Revisions…
Consider The Timeline Of Every
Episode In A Series
Background 49
Now Consider A Timeline
Of Wiki Revisions Created By Fans
Background 50
Finally Consider A Third Timeline For
Mementos Created From Those Revisions
Background 51
Steiner Noticed That
Events Inspire Wiki Revisions
Background 52
… or episodes inspire wiki edits.
http://arxiv.org/abs/1303.4702
Bringing Us To The Naïve Spoiler Concept:
Revisions After An Episode Contain Spoilers
Background 53
Bringing Us To The Naïve Spoiler Concept:
Revisions After An Episode Contain Spoilers
Background 54
RELATED WORK
We Are Not The Only Ones Seeking To Avoid Spoilers…
55
• Background
• Related Work
• TimeGate Heuristics
• Theory of Spoiler Probability
• Measurements of Spoiler Probability
• Spoilers In The Wayback Machine
• The Memento MediaWiki Extension
• Future Work
• Conclusions
So, What Have Others Done About
Spoilers?
Related Work 56
Spoiler Notices!!!
Hidden Text!!!
Blocking Social
Media Posts!
Existing Academic Studies
Have Dealt With Social Media
• Separate studies conducted by Johns and the team of
Schirra, Sun, and Bently studied two-screen viewing
• Boyd-Graber, Glasgow, and Zajac attempted to use
machine learning to find spoilers in social media
• To avoid spoilers fans would avoid, or abandon:
– Social media
– Online web pages
– TV shows
• This results in lost revenue to advertisers!
Related Work 57
Our Goals
• Work with wikis, not social media
• Not just warn the user!
• Not hide the data!
• Show the user what existed before the spoiler
was revealed, so the resource is still useful.
Related Work 58
TIMEGATE HEURISTICS
TimeGates Hold The Key To Avoiding Spoilers
59
• Background
• Related Work
• TimeGate Heuristics
• Theory of Spoiler Probability
• Measurements of Spoiler Probability
• Spoilers In The Wayback Machine
• The Memento MediaWiki Extension
• Future Work
• Conclusions
TimeGates Can Be Represeted By A
Function
TimeGate Heuristics 60
M = memento returned (URI-M)
R = original resource (URI-R)
ta = desired datetime
h = heuristic being used for TimeGate
mindist Is The Most Widely Used Heuristic
TimeGate Heuristics 61
Minimum Distance From ta With No Bounds
But mindist Can Lead To Spoilers…
TimeGate Heuristics 62
Minimum Distance – no bounds
spoiler
minpast Does Not Lead To Spoilers
TimeGate Heuristics 63
Minimum Distance In The Past Where Upper Bound = ta
minfutr Is The Opposite Of minpast
TimeGate Heuristics 64
Minimum Distance In The Future Where Lower Bound = ta
minfutr Always Leads To Spoilers
TimeGate Heuristics 65
Minimum Distance In The Future Where Lower Bound = ta
spoiler
Other Heuristics Can Lead To Spoilers
• minnear – bounds specified by user/system
• eqpast – compare on both sides of ta, pick
past if equal, mindist if not
• eqfutr – compare on both sides of ta, pick
future if equal, mindist if not
• simpast – compare on both sides of ta, pick
past if similar, mindist if not
• simfutr – compare on both sides of ta, pick
future if similar, mindist if not
TimeGate Heuristics 66
We Compared These Heuristics Based On
Performance And Spoiler Avoidance
TimeGate Heuristics 67
Even If Modified To Default To minpast,
These Heuristics Perform More Poorly
TimeGate Heuristics 68
minpast Is Best For Avoiding Spoilers
TimeGate Heuristics 69
THEORY OF SPOILER PROBABILITY
Using mindist Can Be Hazardous For Avoiding Spoilers
70
• Background
• Related Work
• TimeGate Heuristics
• Theory of Spoiler Probability
• Measurements of Spoiler Probability
• Spoilers In The Wayback Machine
• The Memento MediaWiki Extension
• Future Work
• Conclusions
Remember: Consider The Timeline Of
Every Episode In A Series
Background 71
Remember: Consider A Timeline
Of Wiki Revisions Created By Fans
Background 72
Remember: Consider A Third Timeline For
Mementos Created From Those Revisions
Background 73
Remember Our Naïve Spoiler Concept
Theory of Spoiler Probability 74
Spoiler Areas Exist Where mindist Returns
The User A Memento From The Future Of
Their Desired Datetime
Theory of Spoiler Probability 75
In this example, using mindist, if the user requests a memento with a datetime
between t9 and t11, denoted by the red area, they will get mk (which is rj) ,
which exists in the future, even though they chose a datetime before mk!
Conditions Arising From These Timelines
Are Defined By When They Occur
• Pre-Archive – occurs prior to first memento
• Archive-Extant – occurs after first memento
• Post-Archive – occurs after last memento
Theory of Spoiler Probability 76
Condition: Pre Archive Spoiler Area
Theory of Spoiler Probability 77
Spoiler area exists for e3, and one still exists for e2
because m1 maps to rj, which is after e3
Condition: Pre Archive Safe Area
Theory of Spoiler Probability 78
No spoiler area for e3, but one still exists for e2
because m1 maps to rj, but rj is before e3
Condition: Archive Extant Spoiler Area
Theory of Spoiler Probability 79
Condition: Archive Extant Safe Area
EHR
Theory of Spoiler Probability 80
Condition: Archive Extant Safe Area
HRE
Theory of Spoiler Probability 81
The Area Between the First and Last
Episodes Is A Potential Spoiler Zone
Theory of Spoiler Probability 82
Using Spoiler Areas And A Potential
Spoiler Zone, We Can Calculate The
Probability Of Spoiler For A Page
Theory of Spoiler Probability 83
s = # of seconds where we are in a spoiler area
c = # of seconds between e1 and en
MEASUREMENTS OF SPOILER
PROBABILITY
Actual Spoiler Areas From Actual Wikis
84
• Background
• Related Work
• TimeGate Heuristics
• Theory of Spoiler Probability
• Measurements of Spoiler Probability
• Spoilers In The Wayback Machine
• The Memento MediaWiki Extension
• Future Work
• Conclusions
Sixteen Lucky Fan Wikis Were Selected
From wikia.com…
Measurements of Spoiler Probability 85
We See That Not All Pages Are
Archived
Measurements of Spoiler Probability 86
None of
these are
100%
In fact,
38% of the
total is not
available in
the
Internet
Archive.
For Pages With Mementos
We Sought To Find Spoiler Areas
Measurements of Spoiler Probability 87
[ (ts1, tf1), (ts2, tf2), … (tsn, tfn) ]List of Spoiler Areas
Pre-Archive
Equation
Archive-Extant
Equation
Revisions from
Wiki export pages
Mementos from
Internet Archive
TimeMaps
Episode Dates
From epguides.com
Spoiler Areas Do Exist!
Measurements of Spoiler Probability 88
Spoiler Areas Do Exist!
Measurements of Spoiler Probability 89
One Can See Whole Seasons Clumped Together
Fewer Mementos Result In Fewer Chances of Spoiler
We Discovered Different Categories of
Pages
• Normal
• Wiki-Before-Show
• Season-In-A-Day
Measurements of Spoiler Probability 90
Normal Pages
Are Started After The Series Starts
Measurements of Spoiler Probability 91
Normal Pages
Are Started After The Series Starts
Measurements of Spoiler Probability 92
Pre-Archive
Spoiler Areas
Begin At First
Episode
Wiki-Before-Show Pages
Are Started Prior To The First Episode
Measurements of Spoiler Probability 93
Season-in-a-Day Pages
Have Many Episodes In A Single Day
Measurements of Spoiler Probability 94
For Normal Pages, We See A Variety of
Probabilities
Measurements of Spoiler Probability 95
For Wiki-Before-Show Pages, We Also
See A Variety of Probabilities
Measurements of Spoiler Probability 96
Our Model Breaks Down For
Season-In-A-Day
Measurements of Spoiler Probability 97
13 spoiler
areas
exist for
this page,
but they
all have
length 0
The Probabilities Do Not Follow A
Known Distribution
Measurements of Spoiler Probability 98
50% Of The Pages Have A Spoiler
Probability < 0.66
Measurements of Spoiler Probability 99
Background 100
These Revisions Were Missed
By The Web Archive…
Remember Missed Updates?
We Had An Opportunity To Measure
Missed Updates
Measurements of Spoiler Probability 101
We See Lines Where The Archive
Adopts A More Aggressive Policy
Measurements of Spoiler Probability 102
Notice How The Colors
To The Right Get Lighter, Indicating
Fewer Missed Updates
Looking At Redundant Mementos,
We See The Opposite Effect
Measurements of Spoiler Probability 103
Notice How The Colors
To The Right Get Darker, Indicating
More Redundant Mementos
Looking At Redundant Mementos,
We See The Opposite Effect
Measurements of Spoiler Probability 104
Remember That 38% of the
Pages Do Not Have Mementos,
So This Does Not Reflect All
Archived Pages
SPOILERS IN THE WAYBACK
MACHINE
The Wayback Machine Uses mindist, Ergo Users Encounter Spoilers There…
105
• Background
• Related Work
• TimeGate Heuristics
• Theory of Spoiler Probability
• Measurements of Spoiler Probability
• Spoilers In The Wayback Machine
• The Memento MediaWiki Extension
• Future Work
• Conclusions
We Have Access To Some Anonymized
Logs From The Wayback Machine
Spoilers In The Wayback Machine 106
From these logs, we have:
• which memento the user came from (the referer)
• the memento returned to the user
The Wayback Machine Rewrites URIs, Redirecting Users When They Click
On Links To Other Pages That Exist During The Same Time Period.
We Can Infer Desired Datetime And
Have The Memento-datetime
Spoilers In The Wayback Machine 107
From the referrer, we can infer their desired datetime
From the memento returned to the user, we can acquire the Memento-Datetime
Using the Logs, We Can See How Many
wikia.com Requests End In Spoilers
Spoilers In The Wayback Machine 108
Revisions From
Wiki Export Pages
Wayback Logs
Desired Datetime
From Referrer
Memento-datetime
From Visited URI
Datetime Of Revision
Did The
Wayback Machine
Deliver The User To A
Revision That Existed In The
Future From The Desired
Datetime?
Safe Spoiler
No Yes
Wayback Machine Log Results
For wikia.com Requests
Spoilers In The Wayback Machine 109
We Have No Way To Identify Wikis Other Than By Domain Name,
Data Does Not Include All Wikis…
THE MEMENTO MEDIAWIKI
EXTENSION
A Solution For Avoiding Spoilers In Wikis. Wiki Revisions Are Mementos, Too!
110
• Background
• Related Work
• TimeGate Heuristics
• Theory of Spoiler Probability
• Measurements of Spoiler Probability
• Spoilers In The Wayback Machine
• The Memento MediaWiki Extension
• Future Work
• Conclusions
We Developed The Memento
MediaWiki Extension To Use minpast
The Memento MediaWiki Extension 111
MediaWiki Is The Software Used By
Wikipedia And Wikia.Com
http://www.mediawiki.org/wiki/Extension:Memento
We Set Up A Demonstration Wiki…
The Memento MediaWiki Extension 112
http://ws-dl-05.cs.odu.edu/demo/index.php/Main_Page
Data was exported from: http://awoiaf.westeros.org
For Performance, We Had An Opportunity
To Experiment With Alternate Memento
Patterns
The Memento MediaWiki Extension 113
Original Resource Acts As Own
TimeGate (Pattern 1.1)
The Memento MediaWiki Extension 114
This Pattern Only Requires 2 Requests To Acquire A Memento.
Intuitively, It Should Perform Better.
Client Must Ask About TimeGate
(Pattern 2.1)
The Memento MediaWiki Extension 115
This Pattern Requires 3 Requests To Acquire A Memento.
Intuitively, It Should Perform Worse.
Using Analysis and Experimentation
We Found That The Three-Request
Pattern Actually Performed Better
The Memento MediaWiki Extension 116
Mediawiki Takes Too Long To Generate A
TimeGate Response If The Two Are The
Same Page.
The Three-request Pattern Requires An
Extra Request, But, Due To Round-trip
Time, It Performs Better Unless The User
Has A Bandwidth Of 21,926 bps Or Less.
Serving Current Pages Takes The Same
Time With Or Without The Extension
The Memento MediaWiki Extension 117
Worse Performance
Better Performance
We used seige for testing, as detailed in http://arxiv.org/abs/1406.3876
Mean = -0.0072 s
Std dev = 0.3526 s
Serving Old Pages (Mementos) Takes The
Same Time With Or Without The Extension
The Memento MediaWiki Extension 118
Better Performance
Worse Performance
We used seige for testing, as detailed in http://arxiv.org/abs/1406.3876
Mean = -0.0026 s
Std dev = 0.0421 s
TimeMaps Are Smaller Than Wiki
History Pages, So They Perform Better
The Memento MediaWiki Extension 119
Better Performance
Mean = -34.74 kB
Std dev = 31.12 kB
We used seige for testing, as detailed in http://arxiv.org/abs/1406.3876
MediaWiki Does Not Just Store
Previous Revisions Of Pages…
The Memento MediaWiki Extension 120
Images, Stylesheets, and JavaScript
have previous revisions stored as
well
Then We Wondered, What About
Images? They Could Contain Spoilers…
The Memento MediaWiki Extension 121
Consider This Map
This Map is important to
understanding the
content of this article
This image is changed
as the article is
changed, to reflect its
content
The Memento MediaWiki Extension 122
http://en.wikipedia.org/wiki/Same-sex_marriage_law_in_the_United_States_by_state
It’s The Same Map If Today We Visit
The June 5, 2013 Revision
Users can't view this
embedded resource as
it looked on June 2013
while reading the article
from that time period
http://en.wikipedia.org/w/index.php?title=Same-sex_marriage_law_in_the_United_States_by_state&oldid=558400004
123The Memento MediaWiki Extension
What Should Have Happened
This is the the map from
June, 2013 that should
have been displayed
This is the current map
The content of the article won't match the data in this visual aid, possibly
confusing a user who wanted historical information on this topic
The Memento MediaWiki Extension 124
We Developed A Solution For Images
The $file argument’s getHistory() function of the ImageBeforeProduceHTML
hook can be used to acquire previous revisions of images
The Memento MediaWiki Extension 125
We Could Not Extract Previous
Revisions Of CSS And Javascript…
The data is present, but we could not find any way for an
extension to access or render it.
The Memento MediaWiki Extension 126
We Demonstrated Avoiding Spoilers
At WikiConUSA 2014…
http://ws-dl.blogspot.com/2014/06/2014-06-02-wikiconference-usa-2014-trip.html
127The Memento MediaWiki Extension
Even The Current Version Of The Page
Contains A Spoiler!!!
We want to find information about Kevan Lannister, but haven’t read the book A
Dance with Dragons yet. We set the Memento Chrome Extension prior to the
release of that book: June 29, 2011.
128The Memento MediaWiki Extension
So We Set Memento For Chrome To
The Correct Date…
We use the Memento Chrome Extension to request a revision of the page close to, but
not over, our requested date.
129The Memento MediaWiki Extension
…And Got A Page Without Spoilers
And We Avoid Spoilers for A Dance With Dragons…
130The Memento MediaWiki Extension
FUTURE WORK
Where Do We Go From Here?
131
• Background
• Related Work
• TimeGate Heuristics
• Theory of Spoiler Probability
• Measurements of Spoiler Probability
• Spoilers In The Wayback Machine
• The Memento MediaWiki Extension
• Future Work
• Conclusions
How Do We Handle Spoilers For
Season-In-A-Day Series?
Future Work 132
Can We Create A New Heuristic Based
On mindist For Spoiler Detection?
Future Work 133
Can we process the content of m3 and redirect the user to m2 if
we detect spoilers?
Can We Use The Extension To Avoid
Spoilers For Sports And The News?
Future Work 134
How Do We Use minfutr And minpast To
Study Emerging Topics On Wikipedia
Future Work 135
We Can Do Further Work On Missed
Updates And Redundant Mementos
Future Work 136
CONCLUSIONS
We Got Here…
137
• Background
• Related Work
• TimeGate Heuristics
• Theory of Spoiler Probability
• Measurements of Spoiler Probability
• Spoilers In The Wayback Machine
• The Memento MediaWiki Extension
• Future Work
• Conclusions
We Introduced The Naïve Spoiler Concept
Conclusions 138
We Have Detailed TimeGate Heuristics
Conclusions 139
We Showed How To Calculate The
Probability Of Encountering A Spoiler
Conclusions 140
s = # of seconds where we are in a spoiler area
c = # of seconds between e1 and en
We Calculated Real Spoiler
Probabilities
Conclusions 141
We Showed That the Wayback
Machine Is Serving Spoilers
Conclusions 142
We Developed The Memento
MediaWiki Extension To Use minpast
Conclusions 143
Most of All We Showed That It Is
Possible To Avoid Spoilers In
MediaWiki Fan Sites Using Memento
Conclusions 144
http://gameofthrones.wikia.com/wiki/Joffrey_Baratheon?oldid=125053
Accept-Datetime: Sun, 13 April 2014 00:59:00 GMT
Papers/Presentations
• “Avoiding Spoilers in Fan Wikis of Episodic Fiction”,
with M. L. Nelson (in preparation).
• “Using the Memento MediaWiki Extension To Avoid
Spoilers”, Presentation, WikiConference USA 2014, June
2014.
• “Reconstructing the Past With MediaWiki: Programmatic
Issues and Solutions”, Presentation, WikiConference USA
2014, June 2014.
• “Bringing Web Time Travel to MediaWiki: An Assessment of
the Memento MediaWiki Extension”, with M. L. Nelson, H.
Shankar, and H. Van de Sompel, Tech. Rep. avXiv:1406.3,
Old Dominion University, June 2014.
Papers/Presentations 145
BACKUP SLIDES
Because What You Have Seen Is Just The Tip Of The Iceberg…
146
RELATED WORK
Backup Slides
147
Existing Studies: Leavitt and
Christenfeld
Related Work 148
Existing Studies: Release Date
Different Per Country
• Schirra, Sun, and Bently’s live-tweeting study
on Downton Abbey unearthed a global
problem of avoiding spoilers in shows that air
later for fans in a different country.
• This is further echoed by Leaver’s essay The
Tyranny of Digital Distance discussing the
release of Battlestar Galactica episodes in
Australia.
Related Work 149
Existing Studies: Johns
• Johns studied two-screen viewing (generic
name for live tweeting)
• Fans of particular shows would avoid social
media until they had viewed the latest show
• These fans were trying desperately to avoid
spoilers
• This contradicts Leavitt’s study
Related Work 150
Typically Notices Warn Us of Spoilers
Related Work 151
Some Sites Try To Hide The Spoilers
With HTML And/Or Javascript
Related Work 152
Existing Academic Studies
Have Dealt With Social Media
• Separate studies conducted by Johns and the team of
Schirra, Sun, and Bently studied two-screen viewing
• Boyd-Graber, Glasgow, and Zajac attempted to use
machine learning to find spoilers in social media
• To avoid spoilers fans would avoid, or abandon:
– Social media
– Online web pages
– TV shows
• This results in lost revenue to advertisers!
Related Work 153
Existing Studies: Tsang and Yan
• Fans have abandoned
– Social media
– Online web pages
– TV shows
• Because of the issues in avoiding spoilers
• This results in lost revenue to advertisers!
Related Work 154
Hiding Spoilers: Microformats
Artjom Kurapov created a draft HTML microformat for classifying links and images.
Related Work 155
Hiding Spoilers: Tweetdeck
Related Work
156
Hiding Spoilers: Tumblr Savior
Related Work 157
Hiding Spoilers: Netflix Spoiler Foiler
Related Work 158
Hiding Spoilers: Netflix Spoiler Foiler
Related Work 159
Spoiler Shield Is A Spoiler Blocking Tool
For Social Media
Related Work 160
Hiding Spoilers: Facebook Posts Filter
And Open Tweet Filter
Any Facebook posts with these
keywords will not be displayed.
Any tweets with these
keywords will not be displayed.
Related Work 161
Archiving In MediaWiki: Parsoid
Related Work 162
Archiving In MediaWiki:
Collection Extension
Related Work 163
Seamlessly Viewing Past Versions Of
MediaWiki Pages
Related Work 164
Seamlessly Viewing Past Versions Of
MediaWiki Pages
Wikipedia Proxy Does Not
Address The Problem For
All Wikis.
To Be Generic For Any
Wiki, The Wiki Server Itself
Would Need To Send A
Link Header Back
Indicating The Uri Of The
TimeGate To Use.
Related Work 165
ALTERNATIVES TO THE MEMENTO
MEDIAWIKI EXTENSION
(AND WHY IT IS BETTER!)
Backup Slides
166
Memento Extension vs. Manually
Getting Page Revision
Why Do It When Memento Will Do It For You?
This Is Very Time
Consuming.
Memento Let’s You
Browse Through The
Whole Web With A
Given Date!
167
Memento Extension vs. MediaWiki API
JSON:
{"revid":607345961,"parentid":607210719,"timestamp":"2014-05-06T16:07:52Z”}
XML:
<rev revid="607519915" parentid="607345961" user="Marklemagne"
timestamp="2014-05-07T19:00:26Z"/>
Only A Custom MediaWiki Client Can Turn These Oldid Entries
Into Uris.
Memento Is A Web Standard Way Of Accessing Past Web
Resources And Is Already Implemented For Many Different
Applications (Web Archives, Etc.)
168
Memento Extension vs. MediaWiki API
Link: <http://ws-dl-05.cs.odu.edu/demo-302-recommended-
relations/index.php/Daenerys_Targaryen>; rel="original latest-version",
<http://ws-dl-05.cs.odu.edu/demo-302-recommended-
relations/index.php/Special:TimeGate/Daenerys_Targaryen>; rel="timegate",
<http://ws-dl-05.cs.odu.edu/demo-302-recommended-
relations/index.php/Special:TimeMap/Daenerys_Targaryen>; rel="timemap";
type="application/link-format"; from="Sun, 22 Apr 2007 15:01:20 GMT"; until="Fri, 27
Sep 2013 20:48:24 GMT",
<http://ws-dl-05.cs.odu.edu/demo-302-recommended-
relations/index.php?title=Daenerys_Targaryen&oldid=1499>; rel="first memento";
datetime="Sun, 22 Apr 2007 15:01:20 GMT",
<http://ws-dl-05.cs.odu.edu/demo-302-recommended-
relations/index.php?title=Daenerys_Targaryen&oldid=107643>; rel="last memento";
datetime="Fri, 27 Sep 2013 20:48:24 GMT"
Memento Also Follows The RESTful Principle Of “Follow
Your Nose”, Indicating Additional Resources To Access
From Here. 169
Memento Extension vs. Internet
Archive
The Internet Archive
Only Gets Some Of
The Revisions Of A
Given Page.
MediaWiki Has All
Of The Revisions Of
A Given Page.
Memento Extension vs. Other
MediaWiki Time Travel Extensions
While These Extensions Just Work For MediaWiki,
Memento Works For The Entire Web.
With The Memento Extensions, One Can
Browse The Entire Web Spoiler Free By
Seamlessly Accessing Web Archives And Other
Resources Through Memento.
171
MEMENTO PATTERN COMPARISON IN
THE MEMENTO MEDIAWIKI EXTENSION
Backup Slides
172
Assuming An Original Resource Is a
TimeGate (Pattern 1.1)
The Memento MediaWiki Extension 173
Looking For A TimeGate
(Pattern 2.1)
The Memento MediaWiki Extension 174
Comparing These Two Involves
Evaluating Their Performance
The Memento MediaWiki Extension 175
a = time to generate just a normal wiki page
b = time to perform datetime negotiation when the TimeGate is the same
B = time to perform datetime negotiation when the TimeGate is different
M = time to generate memento
RTTa, RTTb, RTTB, RTTM = round trip times for a, b, B, and M
Using curl
We Obtained the Value of a
The Memento MediaWiki Extension 176
With caching, a turns out to be about 0.1 seconds on average
Using seige
We Obtained Values for b and B
The Memento MediaWiki Extension 177
From these results:
Comparing TimeGate Implementations
The Memento MediaWiki Extension 178
Worse Performance
Using Analysis We Worked To Obtain
The Value Of RTTa
The Memento MediaWiki Extension 179
Round Trip Time is the sum of transmission delay and propagation delay
Transmission delay is # of bits divided by the rate of transmission
Requests and responses are typically about 11,840 bits
Assuming a worst case of 1G telephony (28,000 bps), dt = 0.41 s
Using our previous values for a, b, and B, we see that such a 1G
user would need to experience a dp of 0.13s for the two-
request pattern to perform better.
Continuing Our Analysis We Obtained
The Value Of RTTa
0.13s sounds small, but at the speed of light,
this would require the user to be 24,216.7 miles
from the server
This is almost the circumference of the
Earth!!!
The Memento MediaWiki Extension 180
So, At What Value Of dt Does The
Two-Request Pattern Win Out?
The Memento MediaWiki Extension 181
For most users, the three-request pattern
performs better!
At bandwidth less than 21,926 bps,
the two-request pattern wins out
That’s slightly better than 1G
telephony!
ADDITIONAL ARCHIVE EXTANT SAFE
AREAS
Backup Slides
182
These Are Identified By The Order Of
Occurrence Of Halfway, Revision, and Event
• For example
– HRE – means:
1. Halfway mark between two mementos
2. Revision is created
3. Event occurs
– ERH – means:
1. Event occurs
2. Revision is created
3. Halfway mark between two mementos
Theory of Spoiler Probability 183
Condition: Archive Extant Safe Area
REH
Theory of Spoiler Probability 184
Condition: Archive Extant Safe Area
RHE
Theory of Spoiler Probability 185
Condition: Archive Extant Safe Area
ERH
Theory of Spoiler Probability 186
Condition: Post Archive Safe Area RE
Theory of Spoiler Probability 187
No spoilers after last memento mn
Condition: Post Archive Safe Area ER
Theory of Spoiler Probability 188
No spoilers after last memento mn
ARCHITECTURE OF THE MEMENTO
MEDIAWIKI EXTENSION
Backup Slides
189
We Used Class Inheritance For the
Different Memento Resource Types
The Memento MediaWiki Extension 190
http://www.mediawiki.org/wiki/Extension:Memento
MediaWiki SpecialPages Invoke
TimeGate And TimeMap Functionality
The Memento MediaWiki Extension 191
http://www.mediawiki.org/wiki/Extension:Memento
All Datetime Negotiation Is Centralized
In The TimeNegotiator Class
The Memento MediaWiki Extension 192
http://www.mediawiki.org/wiki/Extension:Memento
The Memento Class Is The Entry Point
For The Extension
The Memento MediaWiki Extension 193
http://www.mediawiki.org/wiki/Extension:Memento
MISCELLANEOUS
Backup Slides
194
MediaWiki Still Has CSS Issues
The Memento MediaWiki Extension 195
Other Uses For The Memento
MediaWiki Extension
Evolving laws and legal discourse
Past software contributions
(Folding@Home)
Changing relationship
between organizations
(ICANN vs. Verisign)
196
Memento Headers Extension
197

More Related Content

Viewers also liked

Pizzicato 20
Pizzicato 20Pizzicato 20
Pizzicato 20
Fosc
 
"Rethinking Digital Research" - Wolfram Data Summit
"Rethinking Digital Research" - Wolfram Data Summit"Rethinking Digital Research" - Wolfram Data Summit
"Rethinking Digital Research" - Wolfram Data Summit
Kaitlin Thaney
 
140127 platinum genomes pedigree analyses
140127 platinum genomes pedigree analyses140127 platinum genomes pedigree analyses
140127 platinum genomes pedigree analyses
GenomeInABottle
 
Water%20 Efficient%20 Landscaping%20 Design
Water%20 Efficient%20 Landscaping%20 DesignWater%20 Efficient%20 Landscaping%20 Design
Water%20 Efficient%20 Landscaping%20 Design
sherylwil
 
Creadores de paginas web
Creadores de paginas webCreadores de paginas web
Creadores de paginas web
Adriana Cepeda
 
El asesinato del profesor de matemáticas [2] luis y carla. LEON CANO
El asesinato del profesor de matemáticas [2] luis y carla. LEON CANOEl asesinato del profesor de matemáticas [2] luis y carla. LEON CANO
El asesinato del profesor de matemáticas [2] luis y carla. LEON CANO
Jehosua Joya
 

Viewers also liked (18)

Pizzicato 20
Pizzicato 20Pizzicato 20
Pizzicato 20
 
"Rethinking Digital Research" - Wolfram Data Summit
"Rethinking Digital Research" - Wolfram Data Summit"Rethinking Digital Research" - Wolfram Data Summit
"Rethinking Digital Research" - Wolfram Data Summit
 
How to AutoTweet from your own Facebook App
How to AutoTweet from your own Facebook AppHow to AutoTweet from your own Facebook App
How to AutoTweet from your own Facebook App
 
Recepta de cuina
Recepta de cuinaRecepta de cuina
Recepta de cuina
 
Abaka 31 03-2014
Abaka 31 03-2014Abaka 31 03-2014
Abaka 31 03-2014
 
140127 platinum genomes pedigree analyses
140127 platinum genomes pedigree analyses140127 platinum genomes pedigree analyses
140127 platinum genomes pedigree analyses
 
Harnessing the Power of Media to Grow Business
Harnessing the Power of Media to Grow BusinessHarnessing the Power of Media to Grow Business
Harnessing the Power of Media to Grow Business
 
La plastica de la luz
La plastica de la luzLa plastica de la luz
La plastica de la luz
 
Water%20 Efficient%20 Landscaping%20 Design
Water%20 Efficient%20 Landscaping%20 DesignWater%20 Efficient%20 Landscaping%20 Design
Water%20 Efficient%20 Landscaping%20 Design
 
Progetto Lange: bootfitting o modellazione termica?
Progetto Lange: bootfitting o modellazione termica?Progetto Lange: bootfitting o modellazione termica?
Progetto Lange: bootfitting o modellazione termica?
 
Creadores de paginas web
Creadores de paginas webCreadores de paginas web
Creadores de paginas web
 
Sociala medier i styrelsearbete (In Swedish)
Sociala medier i styrelsearbete (In Swedish)Sociala medier i styrelsearbete (In Swedish)
Sociala medier i styrelsearbete (In Swedish)
 
Biodisel laboratorio
Biodisel laboratorio Biodisel laboratorio
Biodisel laboratorio
 
3.3 beslija es obreast2011final
3.3 beslija es obreast2011final3.3 beslija es obreast2011final
3.3 beslija es obreast2011final
 
El asesinato del profesor de matemáticas [2] luis y carla. LEON CANO
El asesinato del profesor de matemáticas [2] luis y carla. LEON CANOEl asesinato del profesor de matemáticas [2] luis y carla. LEON CANO
El asesinato del profesor de matemáticas [2] luis y carla. LEON CANO
 
English language editing for African scientists
English language editing for African scientistsEnglish language editing for African scientists
English language editing for African scientists
 
Razas de la Horda de WoW
Razas de la Horda de WoWRazas de la Horda de WoW
Razas de la Horda de WoW
 
Campo magnético
Campo magnéticoCampo magnético
Campo magnético
 

Similar to Avoiding Spoilers On MediaWiki Fan Sites Using Memento

Blogswikisrss
BlogswikisrssBlogswikisrss
Blogswikisrss
tomlom
 
Blogs and Wikis
Blogs and Wikis Blogs and Wikis
Blogs and Wikis
kepitcher
 
Wikimedia-Architecture-More-With-Less
Wikimedia-Architecture-More-With-LessWikimedia-Architecture-More-With-Less
Wikimedia-Architecture-More-With-Less
Asher Feldman
 
Wordcamp 2010 Themes for Beginners
Wordcamp 2010 Themes for BeginnersWordcamp 2010 Themes for Beginners
Wordcamp 2010 Themes for Beginners
Bonnie Vasko
 
Wordcamp 2010 Themes for Beginners
Wordcamp 2010 Themes for BeginnersWordcamp 2010 Themes for Beginners
Wordcamp 2010 Themes for Beginners
Bonnie Vasko
 
Pundit - SemLib Annotation Tool
Pundit - SemLib Annotation ToolPundit - SemLib Annotation Tool
Pundit - SemLib Annotation Tool
SemLib Project
 

Similar to Avoiding Spoilers On MediaWiki Fan Sites Using Memento (20)

Blogswikisrss
BlogswikisrssBlogswikisrss
Blogswikisrss
 
Digital Preservation - ODU
Digital Preservation - ODUDigital Preservation - ODU
Digital Preservation - ODU
 
Digital Preservation at ODU
Digital Preservation at ODUDigital Preservation at ODU
Digital Preservation at ODU
 
Semantic Media Wiki & Semantic Forms
Semantic Media Wiki & Semantic FormsSemantic Media Wiki & Semantic Forms
Semantic Media Wiki & Semantic Forms
 
Blogs and Wikis
Blogs and Wikis Blogs and Wikis
Blogs and Wikis
 
Understanding and improving Wikipedia article discussion spaces SAC2011
Understanding and improving Wikipedia article discussion spaces SAC2011Understanding and improving Wikipedia article discussion spaces SAC2011
Understanding and improving Wikipedia article discussion spaces SAC2011
 
Open Content Library LGM 2007
Open Content Library LGM 2007Open Content Library LGM 2007
Open Content Library LGM 2007
 
Make Your Own Writing Paper
Make Your Own Writing PaperMake Your Own Writing Paper
Make Your Own Writing Paper
 
Quantifying Orphaned Annotations in Hypothes.is
Quantifying Orphaned Annotations in Hypothes.isQuantifying Orphaned Annotations in Hypothes.is
Quantifying Orphaned Annotations in Hypothes.is
 
Information Architecture Wikipedia Edit-a-Thon: Training Guide
Information Architecture Wikipedia Edit-a-Thon: Training GuideInformation Architecture Wikipedia Edit-a-Thon: Training Guide
Information Architecture Wikipedia Edit-a-Thon: Training Guide
 
Wikipedia workshop, SpotOn 2013 Conference
Wikipedia workshop, SpotOn 2013 ConferenceWikipedia workshop, SpotOn 2013 Conference
Wikipedia workshop, SpotOn 2013 Conference
 
Open Design Definition workshop @ Open Knowledge Festival 2012
Open Design Definition workshop @ Open Knowledge Festival 2012Open Design Definition workshop @ Open Knowledge Festival 2012
Open Design Definition workshop @ Open Knowledge Festival 2012
 
Wikimedia-Architecture-More-With-Less
Wikimedia-Architecture-More-With-LessWikimedia-Architecture-More-With-Less
Wikimedia-Architecture-More-With-Less
 
Twitch Plays Pokémon: Twitch's Chat Architecture
Twitch Plays Pokémon: Twitch's Chat ArchitectureTwitch Plays Pokémon: Twitch's Chat Architecture
Twitch Plays Pokémon: Twitch's Chat Architecture
 
Social Software To Manage Your World
Social Software To Manage Your WorldSocial Software To Manage Your World
Social Software To Manage Your World
 
Wordcamp 2010 Themes for Beginners
Wordcamp 2010 Themes for BeginnersWordcamp 2010 Themes for Beginners
Wordcamp 2010 Themes for Beginners
 
Wordcamp 2010 Themes for Beginners
Wordcamp 2010 Themes for BeginnersWordcamp 2010 Themes for Beginners
Wordcamp 2010 Themes for Beginners
 
Wordcamp 2010 Themes for Beginners
Wordcamp 2010 Themes for BeginnersWordcamp 2010 Themes for Beginners
Wordcamp 2010 Themes for Beginners
 
Pundit - SemLib Annotation Tool
Pundit - SemLib Annotation ToolPundit - SemLib Annotation Tool
Pundit - SemLib Annotation Tool
 
Web Information Systems Lecture 1: Introduction
Web Information Systems Lecture 1: IntroductionWeb Information Systems Lecture 1: Introduction
Web Information Systems Lecture 1: Introduction
 

More from Shawn Jones

Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Shawn Jones
 
DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...
DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...
DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...
Shawn Jones
 
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Shawn Jones
 
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...
Shawn Jones
 
Improving Collection Understanding For Web Archives With Storytelling: Shinin...
Improving Collection Understanding For Web Archives With Storytelling: Shinin...Improving Collection Understanding For Web Archives With Storytelling: Shinin...
Improving Collection Understanding For Web Archives With Storytelling: Shinin...
Shawn Jones
 
Automatically Selecting Striking Images for Social Cards
Automatically Selecting Striking Images for Social CardsAutomatically Selecting Striking Images for Social Cards
Automatically Selecting Striking Images for Social Cards
Shawn Jones
 
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Shawn Jones
 
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Shawn Jones
 

More from Shawn Jones (16)

Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
 
DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...
DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...
DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...
 
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
 
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...
 
Improving Collection Understanding For Web Archives With Storytelling: Shinin...
Improving Collection Understanding For Web Archives With Storytelling: Shinin...Improving Collection Understanding For Web Archives With Storytelling: Shinin...
Improving Collection Understanding For Web Archives With Storytelling: Shinin...
 
Automatically Selecting Striking Images for Social Cards
Automatically Selecting Striking Images for Social CardsAutomatically Selecting Striking Images for Social Cards
Automatically Selecting Striking Images for Social Cards
 
SHARI (StoryGraph Hypercane ArchiveNow Raintale Integration)
SHARI(StoryGraph Hypercane ArchiveNow Raintale Integration)SHARI(StoryGraph Hypercane ArchiveNow Raintale Integration)
SHARI (StoryGraph Hypercane ArchiveNow Raintale Integration)
 
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
 
Storytelling With Web Archives
Storytelling With Web ArchivesStorytelling With Web Archives
Storytelling With Web Archives
 
Combining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web ArchivesCombining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web Archives
 
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
 
The Off-Topic Memento Toolkit
The Off-Topic Memento ToolkitThe Off-Topic Memento Toolkit
The Off-Topic Memento Toolkit
 
The Many Shapes of Archive-It
The Many Shapes of Archive-ItThe Many Shapes of Archive-It
The Many Shapes of Archive-It
 
Improving Collection Understanding in Web Archives
Improving Collection Understanding in Web ArchivesImproving Collection Understanding in Web Archives
Improving Collection Understanding in Web Archives
 
Reference Rot
Reference RotReference Rot
Reference Rot
 
Where Can We Post Stories Summarizing Web Archive Collections
Where Can We Post Stories Summarizing Web Archive CollectionsWhere Can We Post Stories Summarizing Web Archive Collections
Where Can We Post Stories Summarizing Web Archive Collections
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Avoiding Spoilers On MediaWiki Fan Sites Using Memento

  • 1. Avoiding Spoilers On MediaWiki Fan Sites Using Memento By Shawn M. Jones sjone@cs.odu.edu Committee: Dr. Michael L. Nelson Dr. Michele C. Weigle Dr. Irwin B. Levinstein
  • 2. Warning: This presentation may contain spoilers Motivation 2
  • 3. It Started With A Discussion At Work About This Guy From Game Of Thrones Motivation 3
  • 4. So We Use Fan Wikis, Because They Are Useful For Our Discussions Motivation 4 http://gameofthrones.wikia.com/wiki/Joffrey_Baratheon
  • 5. But The Current Page For This Character Contains Spoilers Motivation 5
  • 6. But The Current Page For This Character Contains Spoilers Motivation 6
  • 7. But The Current Page For This Character Contains Spoilers Motivation 7
  • 8. But The Current Page For This Character Contains Spoilers Motivation 8
  • 9. But The Current Page For This Character Contains Spoilers Motivation 9
  • 10. We All Enjoy Some Episodic Fiction So Much… Motivation 10
  • 11. …That Fans Have Created Wikis… Motivation 11
  • 12. …And The Rest Of Us Read Them Motivation 12
  • 13. So, What If We Could Avoid The Spoilers By Using Past Wiki Pages? Motivation 13 http://gameofthrones.wikia.com/wiki/Joffrey_Baratheon?oldid=125053
  • 14. Order Of Discussion • Background • Related Work • TimeGate Heuristics • Theory Of Spoiler Probability • Measurements Of Spoiler Probability • Spoilers In The Wayback Machine • The Memento MediaWiki Extension • Future Work • Conclusions 14
  • 15. BACKGROUND Building To The Naïve Spoiler Concept 15 • Background • Related Work • TimeGate Heuristics • Theory of Spoiler Probability • Measurements of Spoiler Probability • Spoilers In The Wayback Machine • The Memento MediaWiki Extension • Future Work • Conclusions
  • 16. Most Of Us Are Familiar With The Web Browser And HTML Background 16 The Web Browser is how we view web pages HTML is what the browser parses to render the page
  • 17. Hypertext Transfer Protocol (HTTP) Is What Delivers Pages To The Browser Background 17
  • 18. Here Is An Example HTTP Request From Google Chrome Background 18
  • 19. And The Interesting Parts Of Our Request… Background 19 Get me /wiki/The_Hunger_Games from en.wikipiedia.org
  • 20. Here Is An Example HTTP Response From Wikipedia’s Server For That Page Background 20
  • 21. And The Interesting Parts Of Our Response… Background 21 OK, I Have What You Are Looking For This Is How Big It Is This Is When It Was Last Changed Newline Indicating Start Of HTML HTML Starts Here And You Know How Big It Is, So Stop Reading When You Get 13769 Bytes
  • 22. Introducing Web Architecture: Resources Can Have Many Representations Background 22
  • 23. In 1996 Tim Berners-Lee Discussed Different Dimensions Of Representation http://www.w3.org/DesignIssues/Generic.html Background 23
  • 24. These Representations Are Requested And Identified By HTTP Headers Request Header Response Header Dimension Accept Content-Type Content type of Representation Accept-Language Content-Language Language of Representation Accept-Encoding Content-Encoding Medium of Representation Accept-Charset Content-Type Character Set of Representation Background 24 These Headers Were Specified In RFC 2616 Their Updated Specification Is In RFC 7231
  • 25. These Representations Are Requested And Identified By HTTP Headers Request Header Response Header Dimension Accept Content-Type Content type of Representation Accept-Language Content-Language Language of Representation Accept-Encoding Content-Encoding Medium of Representation Accept-Charset Content-Type Character Set of Representation Background 25 The term for this is Content Negotiation It works so well that many are unaware of its ubiquity Content Negotiation Does Not Create Representations, It Only Directs The User To Ones That Already Exist…
  • 26. Using Our Headers Example, Chrome Tells The Server What You Want Background 26 I prefer image/webp, and I want it compressed with gzip, and I want it in US English
  • 27. Using Our Headers Example, Wikipedia Tells Chrome What It Returns Background 27 My response is compressed with gzip, and My response is in US English This response is in text/html with a character set of UTF-8
  • 28. Memento Finally Completes The Set By Including Time Request Header Response Header Dimension Accept Content-Type Content type of Representation Accept-Language Content-Language Language of Representation Accept-Encoding Content-Encoding Medium of Representation Accept-Charset Content-Type Character Set of Representation Accept-Datetime Memento-Datetime Time of the Representation Background 28 RFC 7089 Defines Memento, Allowing Us To Negotiate In Time
  • 29. Memento Works In Several Steps Background 29 Current Page Archived Page From The Past Resource That Redirects To Archived Pages
  • 30. At First The Client Hopes To Get A TimeGate URI From the Server, But… Background 30 Current Page Archived Page From The Past Resource That Redirects To Archived Pages Most Servers Do Not Yet Return This Data So Memento Clients Default To Known TimeGates
  • 31. Then, A Browser Asks A TimeGate For A Specific Page From A Specific Datetime Background 31 I Want This Page On This Date
  • 32. And The TimeGate Tells The Browser Where It Can Go Background 32 Go To This URI To Get The Best Memento For The Date You Requested Do Not Stop Here, I Am Redirecting You Elsewhere
  • 33. Then The Browser Gets A Memento Like Any Other Web Page Background 33 This Is The Datetime Of This Memento Here’s Everything I Know About Archives For That Page
  • 34. Memento Also Provides TimeMaps As A Machine-Readable List Of Mementos Background 34
  • 35. Memento For Chrome Lets Users “Right-Click Into The Past” Background 35 http://ws-dl.blogspot.com/2013/10/2013-10-14-right-click-to-past-memento.html
  • 36. Mink Lets Users See All Mementos For A Page Background 36 http://ws-dl.blogspot.com/2014/10/2014-10-03-integrating-live-and.html
  • 38. How Are Mementos Acquired? Background 38
  • 39. The Internet Archive Provides The Wayback Machine Background 39 http://archive.org/web/
  • 40. Using The Wayback Machine One Can Select A URI And Datetime… Background 40 http://web.archive.org/web/*/http://lostpedia.wikia.com
  • 41. …And See The Memento From The Internet Archive Background 41 http://web.archive.org/web/20140214065217/http://lostpedia.wikia.com/wiki/Main_Page
  • 42. The Internet Archive Contains Mementos For Wiki Pages Background 42 http://web.archive.org/web/20040204011112/http://en.wikipedia.org/wiki/Abraham_Lincoln
  • 43. Here Is An Example Wiki Article… Background 43 http://en.wikipedia.org/wiki/Abraham_Lincoln
  • 44. Wikis Are Edited By Multiple Authors, Resulting In Many Revisions… Background 44 http://en.wikipedia.org/w/index.php?title=Abraham_Lincoln&action=history
  • 45. Most Interesting: Wikis Keep EVERY REVISION OF A PAGE!!! Background 45 http://en.wikipedia.org/w/index.php?title=Abraham_Lincoln&action=history
  • 46. With Every Revision Preserved, One Can Visit Old Revisions Of Pages Background 46 http://en.wikipedia.org/w/index.php?title=Abraham_Lincoln&oldid=345783631
  • 47. For A Wiki, Every Web Archive Memento Can Be Tied To A Wiki Revision Background 47
  • 48. Background 48 We Also Know Which Revisions Were Missed… For A Wiki, Every Web Archive Memento Can Be Tied To A Wiki Revision The Web Archives Are Sampling The Wiki Revisions…
  • 49. Consider The Timeline Of Every Episode In A Series Background 49
  • 50. Now Consider A Timeline Of Wiki Revisions Created By Fans Background 50
  • 51. Finally Consider A Third Timeline For Mementos Created From Those Revisions Background 51
  • 52. Steiner Noticed That Events Inspire Wiki Revisions Background 52 … or episodes inspire wiki edits. http://arxiv.org/abs/1303.4702
  • 53. Bringing Us To The Naïve Spoiler Concept: Revisions After An Episode Contain Spoilers Background 53
  • 54. Bringing Us To The Naïve Spoiler Concept: Revisions After An Episode Contain Spoilers Background 54
  • 55. RELATED WORK We Are Not The Only Ones Seeking To Avoid Spoilers… 55 • Background • Related Work • TimeGate Heuristics • Theory of Spoiler Probability • Measurements of Spoiler Probability • Spoilers In The Wayback Machine • The Memento MediaWiki Extension • Future Work • Conclusions
  • 56. So, What Have Others Done About Spoilers? Related Work 56 Spoiler Notices!!! Hidden Text!!! Blocking Social Media Posts!
  • 57. Existing Academic Studies Have Dealt With Social Media • Separate studies conducted by Johns and the team of Schirra, Sun, and Bently studied two-screen viewing • Boyd-Graber, Glasgow, and Zajac attempted to use machine learning to find spoilers in social media • To avoid spoilers fans would avoid, or abandon: – Social media – Online web pages – TV shows • This results in lost revenue to advertisers! Related Work 57
  • 58. Our Goals • Work with wikis, not social media • Not just warn the user! • Not hide the data! • Show the user what existed before the spoiler was revealed, so the resource is still useful. Related Work 58
  • 59. TIMEGATE HEURISTICS TimeGates Hold The Key To Avoiding Spoilers 59 • Background • Related Work • TimeGate Heuristics • Theory of Spoiler Probability • Measurements of Spoiler Probability • Spoilers In The Wayback Machine • The Memento MediaWiki Extension • Future Work • Conclusions
  • 60. TimeGates Can Be Represeted By A Function TimeGate Heuristics 60 M = memento returned (URI-M) R = original resource (URI-R) ta = desired datetime h = heuristic being used for TimeGate
  • 61. mindist Is The Most Widely Used Heuristic TimeGate Heuristics 61 Minimum Distance From ta With No Bounds
  • 62. But mindist Can Lead To Spoilers… TimeGate Heuristics 62 Minimum Distance – no bounds spoiler
  • 63. minpast Does Not Lead To Spoilers TimeGate Heuristics 63 Minimum Distance In The Past Where Upper Bound = ta
  • 64. minfutr Is The Opposite Of minpast TimeGate Heuristics 64 Minimum Distance In The Future Where Lower Bound = ta
  • 65. minfutr Always Leads To Spoilers TimeGate Heuristics 65 Minimum Distance In The Future Where Lower Bound = ta spoiler
  • 66. Other Heuristics Can Lead To Spoilers • minnear – bounds specified by user/system • eqpast – compare on both sides of ta, pick past if equal, mindist if not • eqfutr – compare on both sides of ta, pick future if equal, mindist if not • simpast – compare on both sides of ta, pick past if similar, mindist if not • simfutr – compare on both sides of ta, pick future if similar, mindist if not TimeGate Heuristics 66
  • 67. We Compared These Heuristics Based On Performance And Spoiler Avoidance TimeGate Heuristics 67
  • 68. Even If Modified To Default To minpast, These Heuristics Perform More Poorly TimeGate Heuristics 68
  • 69. minpast Is Best For Avoiding Spoilers TimeGate Heuristics 69
  • 70. THEORY OF SPOILER PROBABILITY Using mindist Can Be Hazardous For Avoiding Spoilers 70 • Background • Related Work • TimeGate Heuristics • Theory of Spoiler Probability • Measurements of Spoiler Probability • Spoilers In The Wayback Machine • The Memento MediaWiki Extension • Future Work • Conclusions
  • 71. Remember: Consider The Timeline Of Every Episode In A Series Background 71
  • 72. Remember: Consider A Timeline Of Wiki Revisions Created By Fans Background 72
  • 73. Remember: Consider A Third Timeline For Mementos Created From Those Revisions Background 73
  • 74. Remember Our Naïve Spoiler Concept Theory of Spoiler Probability 74
  • 75. Spoiler Areas Exist Where mindist Returns The User A Memento From The Future Of Their Desired Datetime Theory of Spoiler Probability 75 In this example, using mindist, if the user requests a memento with a datetime between t9 and t11, denoted by the red area, they will get mk (which is rj) , which exists in the future, even though they chose a datetime before mk!
  • 76. Conditions Arising From These Timelines Are Defined By When They Occur • Pre-Archive – occurs prior to first memento • Archive-Extant – occurs after first memento • Post-Archive – occurs after last memento Theory of Spoiler Probability 76
  • 77. Condition: Pre Archive Spoiler Area Theory of Spoiler Probability 77 Spoiler area exists for e3, and one still exists for e2 because m1 maps to rj, which is after e3
  • 78. Condition: Pre Archive Safe Area Theory of Spoiler Probability 78 No spoiler area for e3, but one still exists for e2 because m1 maps to rj, but rj is before e3
  • 79. Condition: Archive Extant Spoiler Area Theory of Spoiler Probability 79
  • 80. Condition: Archive Extant Safe Area EHR Theory of Spoiler Probability 80
  • 81. Condition: Archive Extant Safe Area HRE Theory of Spoiler Probability 81
  • 82. The Area Between the First and Last Episodes Is A Potential Spoiler Zone Theory of Spoiler Probability 82
  • 83. Using Spoiler Areas And A Potential Spoiler Zone, We Can Calculate The Probability Of Spoiler For A Page Theory of Spoiler Probability 83 s = # of seconds where we are in a spoiler area c = # of seconds between e1 and en
  • 84. MEASUREMENTS OF SPOILER PROBABILITY Actual Spoiler Areas From Actual Wikis 84 • Background • Related Work • TimeGate Heuristics • Theory of Spoiler Probability • Measurements of Spoiler Probability • Spoilers In The Wayback Machine • The Memento MediaWiki Extension • Future Work • Conclusions
  • 85. Sixteen Lucky Fan Wikis Were Selected From wikia.com… Measurements of Spoiler Probability 85
  • 86. We See That Not All Pages Are Archived Measurements of Spoiler Probability 86 None of these are 100% In fact, 38% of the total is not available in the Internet Archive.
  • 87. For Pages With Mementos We Sought To Find Spoiler Areas Measurements of Spoiler Probability 87 [ (ts1, tf1), (ts2, tf2), … (tsn, tfn) ]List of Spoiler Areas Pre-Archive Equation Archive-Extant Equation Revisions from Wiki export pages Mementos from Internet Archive TimeMaps Episode Dates From epguides.com
  • 88. Spoiler Areas Do Exist! Measurements of Spoiler Probability 88
  • 89. Spoiler Areas Do Exist! Measurements of Spoiler Probability 89 One Can See Whole Seasons Clumped Together Fewer Mementos Result In Fewer Chances of Spoiler
  • 90. We Discovered Different Categories of Pages • Normal • Wiki-Before-Show • Season-In-A-Day Measurements of Spoiler Probability 90
  • 91. Normal Pages Are Started After The Series Starts Measurements of Spoiler Probability 91
  • 92. Normal Pages Are Started After The Series Starts Measurements of Spoiler Probability 92 Pre-Archive Spoiler Areas Begin At First Episode
  • 93. Wiki-Before-Show Pages Are Started Prior To The First Episode Measurements of Spoiler Probability 93
  • 94. Season-in-a-Day Pages Have Many Episodes In A Single Day Measurements of Spoiler Probability 94
  • 95. For Normal Pages, We See A Variety of Probabilities Measurements of Spoiler Probability 95
  • 96. For Wiki-Before-Show Pages, We Also See A Variety of Probabilities Measurements of Spoiler Probability 96
  • 97. Our Model Breaks Down For Season-In-A-Day Measurements of Spoiler Probability 97 13 spoiler areas exist for this page, but they all have length 0
  • 98. The Probabilities Do Not Follow A Known Distribution Measurements of Spoiler Probability 98
  • 99. 50% Of The Pages Have A Spoiler Probability < 0.66 Measurements of Spoiler Probability 99
  • 100. Background 100 These Revisions Were Missed By The Web Archive… Remember Missed Updates?
  • 101. We Had An Opportunity To Measure Missed Updates Measurements of Spoiler Probability 101
  • 102. We See Lines Where The Archive Adopts A More Aggressive Policy Measurements of Spoiler Probability 102 Notice How The Colors To The Right Get Lighter, Indicating Fewer Missed Updates
  • 103. Looking At Redundant Mementos, We See The Opposite Effect Measurements of Spoiler Probability 103 Notice How The Colors To The Right Get Darker, Indicating More Redundant Mementos
  • 104. Looking At Redundant Mementos, We See The Opposite Effect Measurements of Spoiler Probability 104 Remember That 38% of the Pages Do Not Have Mementos, So This Does Not Reflect All Archived Pages
  • 105. SPOILERS IN THE WAYBACK MACHINE The Wayback Machine Uses mindist, Ergo Users Encounter Spoilers There… 105 • Background • Related Work • TimeGate Heuristics • Theory of Spoiler Probability • Measurements of Spoiler Probability • Spoilers In The Wayback Machine • The Memento MediaWiki Extension • Future Work • Conclusions
  • 106. We Have Access To Some Anonymized Logs From The Wayback Machine Spoilers In The Wayback Machine 106 From these logs, we have: • which memento the user came from (the referer) • the memento returned to the user The Wayback Machine Rewrites URIs, Redirecting Users When They Click On Links To Other Pages That Exist During The Same Time Period.
  • 107. We Can Infer Desired Datetime And Have The Memento-datetime Spoilers In The Wayback Machine 107 From the referrer, we can infer their desired datetime From the memento returned to the user, we can acquire the Memento-Datetime
  • 108. Using the Logs, We Can See How Many wikia.com Requests End In Spoilers Spoilers In The Wayback Machine 108 Revisions From Wiki Export Pages Wayback Logs Desired Datetime From Referrer Memento-datetime From Visited URI Datetime Of Revision Did The Wayback Machine Deliver The User To A Revision That Existed In The Future From The Desired Datetime? Safe Spoiler No Yes
  • 109. Wayback Machine Log Results For wikia.com Requests Spoilers In The Wayback Machine 109 We Have No Way To Identify Wikis Other Than By Domain Name, Data Does Not Include All Wikis…
  • 110. THE MEMENTO MEDIAWIKI EXTENSION A Solution For Avoiding Spoilers In Wikis. Wiki Revisions Are Mementos, Too! 110 • Background • Related Work • TimeGate Heuristics • Theory of Spoiler Probability • Measurements of Spoiler Probability • Spoilers In The Wayback Machine • The Memento MediaWiki Extension • Future Work • Conclusions
  • 111. We Developed The Memento MediaWiki Extension To Use minpast The Memento MediaWiki Extension 111 MediaWiki Is The Software Used By Wikipedia And Wikia.Com http://www.mediawiki.org/wiki/Extension:Memento
  • 112. We Set Up A Demonstration Wiki… The Memento MediaWiki Extension 112 http://ws-dl-05.cs.odu.edu/demo/index.php/Main_Page Data was exported from: http://awoiaf.westeros.org
  • 113. For Performance, We Had An Opportunity To Experiment With Alternate Memento Patterns The Memento MediaWiki Extension 113
  • 114. Original Resource Acts As Own TimeGate (Pattern 1.1) The Memento MediaWiki Extension 114 This Pattern Only Requires 2 Requests To Acquire A Memento. Intuitively, It Should Perform Better.
  • 115. Client Must Ask About TimeGate (Pattern 2.1) The Memento MediaWiki Extension 115 This Pattern Requires 3 Requests To Acquire A Memento. Intuitively, It Should Perform Worse.
  • 116. Using Analysis and Experimentation We Found That The Three-Request Pattern Actually Performed Better The Memento MediaWiki Extension 116 Mediawiki Takes Too Long To Generate A TimeGate Response If The Two Are The Same Page. The Three-request Pattern Requires An Extra Request, But, Due To Round-trip Time, It Performs Better Unless The User Has A Bandwidth Of 21,926 bps Or Less.
  • 117. Serving Current Pages Takes The Same Time With Or Without The Extension The Memento MediaWiki Extension 117 Worse Performance Better Performance We used seige for testing, as detailed in http://arxiv.org/abs/1406.3876 Mean = -0.0072 s Std dev = 0.3526 s
  • 118. Serving Old Pages (Mementos) Takes The Same Time With Or Without The Extension The Memento MediaWiki Extension 118 Better Performance Worse Performance We used seige for testing, as detailed in http://arxiv.org/abs/1406.3876 Mean = -0.0026 s Std dev = 0.0421 s
  • 119. TimeMaps Are Smaller Than Wiki History Pages, So They Perform Better The Memento MediaWiki Extension 119 Better Performance Mean = -34.74 kB Std dev = 31.12 kB We used seige for testing, as detailed in http://arxiv.org/abs/1406.3876
  • 120. MediaWiki Does Not Just Store Previous Revisions Of Pages… The Memento MediaWiki Extension 120 Images, Stylesheets, and JavaScript have previous revisions stored as well
  • 121. Then We Wondered, What About Images? They Could Contain Spoilers… The Memento MediaWiki Extension 121
  • 122. Consider This Map This Map is important to understanding the content of this article This image is changed as the article is changed, to reflect its content The Memento MediaWiki Extension 122 http://en.wikipedia.org/wiki/Same-sex_marriage_law_in_the_United_States_by_state
  • 123. It’s The Same Map If Today We Visit The June 5, 2013 Revision Users can't view this embedded resource as it looked on June 2013 while reading the article from that time period http://en.wikipedia.org/w/index.php?title=Same-sex_marriage_law_in_the_United_States_by_state&oldid=558400004 123The Memento MediaWiki Extension
  • 124. What Should Have Happened This is the the map from June, 2013 that should have been displayed This is the current map The content of the article won't match the data in this visual aid, possibly confusing a user who wanted historical information on this topic The Memento MediaWiki Extension 124
  • 125. We Developed A Solution For Images The $file argument’s getHistory() function of the ImageBeforeProduceHTML hook can be used to acquire previous revisions of images The Memento MediaWiki Extension 125
  • 126. We Could Not Extract Previous Revisions Of CSS And Javascript… The data is present, but we could not find any way for an extension to access or render it. The Memento MediaWiki Extension 126
  • 127. We Demonstrated Avoiding Spoilers At WikiConUSA 2014… http://ws-dl.blogspot.com/2014/06/2014-06-02-wikiconference-usa-2014-trip.html 127The Memento MediaWiki Extension
  • 128. Even The Current Version Of The Page Contains A Spoiler!!! We want to find information about Kevan Lannister, but haven’t read the book A Dance with Dragons yet. We set the Memento Chrome Extension prior to the release of that book: June 29, 2011. 128The Memento MediaWiki Extension
  • 129. So We Set Memento For Chrome To The Correct Date… We use the Memento Chrome Extension to request a revision of the page close to, but not over, our requested date. 129The Memento MediaWiki Extension
  • 130. …And Got A Page Without Spoilers And We Avoid Spoilers for A Dance With Dragons… 130The Memento MediaWiki Extension
  • 131. FUTURE WORK Where Do We Go From Here? 131 • Background • Related Work • TimeGate Heuristics • Theory of Spoiler Probability • Measurements of Spoiler Probability • Spoilers In The Wayback Machine • The Memento MediaWiki Extension • Future Work • Conclusions
  • 132. How Do We Handle Spoilers For Season-In-A-Day Series? Future Work 132
  • 133. Can We Create A New Heuristic Based On mindist For Spoiler Detection? Future Work 133 Can we process the content of m3 and redirect the user to m2 if we detect spoilers?
  • 134. Can We Use The Extension To Avoid Spoilers For Sports And The News? Future Work 134
  • 135. How Do We Use minfutr And minpast To Study Emerging Topics On Wikipedia Future Work 135
  • 136. We Can Do Further Work On Missed Updates And Redundant Mementos Future Work 136
  • 137. CONCLUSIONS We Got Here… 137 • Background • Related Work • TimeGate Heuristics • Theory of Spoiler Probability • Measurements of Spoiler Probability • Spoilers In The Wayback Machine • The Memento MediaWiki Extension • Future Work • Conclusions
  • 138. We Introduced The Naïve Spoiler Concept Conclusions 138
  • 139. We Have Detailed TimeGate Heuristics Conclusions 139
  • 140. We Showed How To Calculate The Probability Of Encountering A Spoiler Conclusions 140 s = # of seconds where we are in a spoiler area c = # of seconds between e1 and en
  • 141. We Calculated Real Spoiler Probabilities Conclusions 141
  • 142. We Showed That the Wayback Machine Is Serving Spoilers Conclusions 142
  • 143. We Developed The Memento MediaWiki Extension To Use minpast Conclusions 143
  • 144. Most of All We Showed That It Is Possible To Avoid Spoilers In MediaWiki Fan Sites Using Memento Conclusions 144 http://gameofthrones.wikia.com/wiki/Joffrey_Baratheon?oldid=125053 Accept-Datetime: Sun, 13 April 2014 00:59:00 GMT
  • 145. Papers/Presentations • “Avoiding Spoilers in Fan Wikis of Episodic Fiction”, with M. L. Nelson (in preparation). • “Using the Memento MediaWiki Extension To Avoid Spoilers”, Presentation, WikiConference USA 2014, June 2014. • “Reconstructing the Past With MediaWiki: Programmatic Issues and Solutions”, Presentation, WikiConference USA 2014, June 2014. • “Bringing Web Time Travel to MediaWiki: An Assessment of the Memento MediaWiki Extension”, with M. L. Nelson, H. Shankar, and H. Van de Sompel, Tech. Rep. avXiv:1406.3, Old Dominion University, June 2014. Papers/Presentations 145
  • 146. BACKUP SLIDES Because What You Have Seen Is Just The Tip Of The Iceberg… 146
  • 148. Existing Studies: Leavitt and Christenfeld Related Work 148
  • 149. Existing Studies: Release Date Different Per Country • Schirra, Sun, and Bently’s live-tweeting study on Downton Abbey unearthed a global problem of avoiding spoilers in shows that air later for fans in a different country. • This is further echoed by Leaver’s essay The Tyranny of Digital Distance discussing the release of Battlestar Galactica episodes in Australia. Related Work 149
  • 150. Existing Studies: Johns • Johns studied two-screen viewing (generic name for live tweeting) • Fans of particular shows would avoid social media until they had viewed the latest show • These fans were trying desperately to avoid spoilers • This contradicts Leavitt’s study Related Work 150
  • 151. Typically Notices Warn Us of Spoilers Related Work 151
  • 152. Some Sites Try To Hide The Spoilers With HTML And/Or Javascript Related Work 152
  • 153. Existing Academic Studies Have Dealt With Social Media • Separate studies conducted by Johns and the team of Schirra, Sun, and Bently studied two-screen viewing • Boyd-Graber, Glasgow, and Zajac attempted to use machine learning to find spoilers in social media • To avoid spoilers fans would avoid, or abandon: – Social media – Online web pages – TV shows • This results in lost revenue to advertisers! Related Work 153
  • 154. Existing Studies: Tsang and Yan • Fans have abandoned – Social media – Online web pages – TV shows • Because of the issues in avoiding spoilers • This results in lost revenue to advertisers! Related Work 154
  • 155. Hiding Spoilers: Microformats Artjom Kurapov created a draft HTML microformat for classifying links and images. Related Work 155
  • 157. Hiding Spoilers: Tumblr Savior Related Work 157
  • 158. Hiding Spoilers: Netflix Spoiler Foiler Related Work 158
  • 159. Hiding Spoilers: Netflix Spoiler Foiler Related Work 159
  • 160. Spoiler Shield Is A Spoiler Blocking Tool For Social Media Related Work 160
  • 161. Hiding Spoilers: Facebook Posts Filter And Open Tweet Filter Any Facebook posts with these keywords will not be displayed. Any tweets with these keywords will not be displayed. Related Work 161
  • 162. Archiving In MediaWiki: Parsoid Related Work 162
  • 163. Archiving In MediaWiki: Collection Extension Related Work 163
  • 164. Seamlessly Viewing Past Versions Of MediaWiki Pages Related Work 164
  • 165. Seamlessly Viewing Past Versions Of MediaWiki Pages Wikipedia Proxy Does Not Address The Problem For All Wikis. To Be Generic For Any Wiki, The Wiki Server Itself Would Need To Send A Link Header Back Indicating The Uri Of The TimeGate To Use. Related Work 165
  • 166. ALTERNATIVES TO THE MEMENTO MEDIAWIKI EXTENSION (AND WHY IT IS BETTER!) Backup Slides 166
  • 167. Memento Extension vs. Manually Getting Page Revision Why Do It When Memento Will Do It For You? This Is Very Time Consuming. Memento Let’s You Browse Through The Whole Web With A Given Date! 167
  • 168. Memento Extension vs. MediaWiki API JSON: {"revid":607345961,"parentid":607210719,"timestamp":"2014-05-06T16:07:52Z”} XML: <rev revid="607519915" parentid="607345961" user="Marklemagne" timestamp="2014-05-07T19:00:26Z"/> Only A Custom MediaWiki Client Can Turn These Oldid Entries Into Uris. Memento Is A Web Standard Way Of Accessing Past Web Resources And Is Already Implemented For Many Different Applications (Web Archives, Etc.) 168
  • 169. Memento Extension vs. MediaWiki API Link: <http://ws-dl-05.cs.odu.edu/demo-302-recommended- relations/index.php/Daenerys_Targaryen>; rel="original latest-version", <http://ws-dl-05.cs.odu.edu/demo-302-recommended- relations/index.php/Special:TimeGate/Daenerys_Targaryen>; rel="timegate", <http://ws-dl-05.cs.odu.edu/demo-302-recommended- relations/index.php/Special:TimeMap/Daenerys_Targaryen>; rel="timemap"; type="application/link-format"; from="Sun, 22 Apr 2007 15:01:20 GMT"; until="Fri, 27 Sep 2013 20:48:24 GMT", <http://ws-dl-05.cs.odu.edu/demo-302-recommended- relations/index.php?title=Daenerys_Targaryen&oldid=1499>; rel="first memento"; datetime="Sun, 22 Apr 2007 15:01:20 GMT", <http://ws-dl-05.cs.odu.edu/demo-302-recommended- relations/index.php?title=Daenerys_Targaryen&oldid=107643>; rel="last memento"; datetime="Fri, 27 Sep 2013 20:48:24 GMT" Memento Also Follows The RESTful Principle Of “Follow Your Nose”, Indicating Additional Resources To Access From Here. 169
  • 170. Memento Extension vs. Internet Archive The Internet Archive Only Gets Some Of The Revisions Of A Given Page. MediaWiki Has All Of The Revisions Of A Given Page.
  • 171. Memento Extension vs. Other MediaWiki Time Travel Extensions While These Extensions Just Work For MediaWiki, Memento Works For The Entire Web. With The Memento Extensions, One Can Browse The Entire Web Spoiler Free By Seamlessly Accessing Web Archives And Other Resources Through Memento. 171
  • 172. MEMENTO PATTERN COMPARISON IN THE MEMENTO MEDIAWIKI EXTENSION Backup Slides 172
  • 173. Assuming An Original Resource Is a TimeGate (Pattern 1.1) The Memento MediaWiki Extension 173
  • 174. Looking For A TimeGate (Pattern 2.1) The Memento MediaWiki Extension 174
  • 175. Comparing These Two Involves Evaluating Their Performance The Memento MediaWiki Extension 175 a = time to generate just a normal wiki page b = time to perform datetime negotiation when the TimeGate is the same B = time to perform datetime negotiation when the TimeGate is different M = time to generate memento RTTa, RTTb, RTTB, RTTM = round trip times for a, b, B, and M
  • 176. Using curl We Obtained the Value of a The Memento MediaWiki Extension 176 With caching, a turns out to be about 0.1 seconds on average
  • 177. Using seige We Obtained Values for b and B The Memento MediaWiki Extension 177 From these results:
  • 178. Comparing TimeGate Implementations The Memento MediaWiki Extension 178 Worse Performance
  • 179. Using Analysis We Worked To Obtain The Value Of RTTa The Memento MediaWiki Extension 179 Round Trip Time is the sum of transmission delay and propagation delay Transmission delay is # of bits divided by the rate of transmission Requests and responses are typically about 11,840 bits Assuming a worst case of 1G telephony (28,000 bps), dt = 0.41 s Using our previous values for a, b, and B, we see that such a 1G user would need to experience a dp of 0.13s for the two- request pattern to perform better.
  • 180. Continuing Our Analysis We Obtained The Value Of RTTa 0.13s sounds small, but at the speed of light, this would require the user to be 24,216.7 miles from the server This is almost the circumference of the Earth!!! The Memento MediaWiki Extension 180
  • 181. So, At What Value Of dt Does The Two-Request Pattern Win Out? The Memento MediaWiki Extension 181 For most users, the three-request pattern performs better! At bandwidth less than 21,926 bps, the two-request pattern wins out That’s slightly better than 1G telephony!
  • 182. ADDITIONAL ARCHIVE EXTANT SAFE AREAS Backup Slides 182
  • 183. These Are Identified By The Order Of Occurrence Of Halfway, Revision, and Event • For example – HRE – means: 1. Halfway mark between two mementos 2. Revision is created 3. Event occurs – ERH – means: 1. Event occurs 2. Revision is created 3. Halfway mark between two mementos Theory of Spoiler Probability 183
  • 184. Condition: Archive Extant Safe Area REH Theory of Spoiler Probability 184
  • 185. Condition: Archive Extant Safe Area RHE Theory of Spoiler Probability 185
  • 186. Condition: Archive Extant Safe Area ERH Theory of Spoiler Probability 186
  • 187. Condition: Post Archive Safe Area RE Theory of Spoiler Probability 187 No spoilers after last memento mn
  • 188. Condition: Post Archive Safe Area ER Theory of Spoiler Probability 188 No spoilers after last memento mn
  • 189. ARCHITECTURE OF THE MEMENTO MEDIAWIKI EXTENSION Backup Slides 189
  • 190. We Used Class Inheritance For the Different Memento Resource Types The Memento MediaWiki Extension 190 http://www.mediawiki.org/wiki/Extension:Memento
  • 191. MediaWiki SpecialPages Invoke TimeGate And TimeMap Functionality The Memento MediaWiki Extension 191 http://www.mediawiki.org/wiki/Extension:Memento
  • 192. All Datetime Negotiation Is Centralized In The TimeNegotiator Class The Memento MediaWiki Extension 192 http://www.mediawiki.org/wiki/Extension:Memento
  • 193. The Memento Class Is The Entry Point For The Extension The Memento MediaWiki Extension 193 http://www.mediawiki.org/wiki/Extension:Memento
  • 195. MediaWiki Still Has CSS Issues The Memento MediaWiki Extension 195
  • 196. Other Uses For The Memento MediaWiki Extension Evolving laws and legal discourse Past software contributions (Folding@Home) Changing relationship between organizations (ICANN vs. Verisign) 196