SlideShare a Scribd company logo
1 of 111
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Web Archives at the Nexus of
Good Fakes and Flawed Originals
Michael L. Nelson
Old Dominion University
Web Science & Digital Libraries Research Group
@WebSciDL, @phonedude_mln
With:
ODU: Michele C. Weigle, John Berlin, Mohamed Aturban, Sawood Alam
LANL: Martin Klein, DANS: Herbert Van de Sompel
Supported in part by The Andrew Mellon Foundation
and the National Science Foundation
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
"You’re in a desert walking along in the
sand when all of the sudden you look down,
and you see a tortoise..."
Supported in part by The Andrew Mellon Foundation
and the National Science Foundation
Michael L. Nelson
Old Dominion University
Web Science & Digital Libraries Research Group
@WebSciDL, @phonedude_mln
With:
ODU: Michele C. Weigle, John Berlin, Mohamed Aturban, Sawood Alam
LANL: Martin Klein, DANS: Herbert Van de Sompel
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
https://en.wikipedia.org/wiki/Blade_Runner
National Film Registry Induction, 1993: https://www.loc.gov/loc/lcib/94/9405/film.html
http://www.loc.gov/static/programs/national-film-preservation-board/documents/blade_runner.pdf
1982 1968
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
https://www.youtube.com/watch?v=LwDdP88Dr54
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
https://www.youtube.com/watch?v=LwDdP88Dr54
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
We’re not going to review RS’s/PKD’s predictions
https://www.cnn.com/2018/12/28/movies/blade-runner-predictions-2019-trnd/
https://twentytwowords.com/blade-runner-was-set-in-2019/
https://nwn.blogs.com/nwn/2019/01/blade-runner-los-angeles-2019.html
https://www.theregister.co.uk/2019/01/01/blade_runner_today/
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Common themes in the works of Phillip K. Dick
• identity
• self vs. the other
• memory
• humanity
• authenticity
• reality vs. simulacra
• unreliable narrator
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Blade Runner in 279 characters
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Voight-Kampff Test: distinguishing
authentic (humans) vs. fake (replicants)
https://www.youtube.com/watch?v=ic0PuvJbdu0
You’re in a desert walking along in the sand when all of the sudden you look down,
and you see a tortoise. You reach down, you flip the tortoise over on its back.
The tortoise lays on its back, its belly baking in the hot sun, beating its legs trying
to turn itself over, but it can’t, not without your help. But you’re not helping. Why is that?
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Robots indistinguishable from humans,
off-world slaves, perpetually “dark and stormy”
Los Angeles – all good cyberpunk sci-fi tropes –
but that’s not our 2019, right?
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
The future is already here —
it's just not evenly distributed.
-- William Gibson (yes, I’m mixing sci-fi authors)
https://twitter.com/badnetworker/status/1093864777179430912
https://geekologie.com/2018/02/boston-dynamics-tests-door-opening-robot.php
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
“So when do we get to that part about
web archiving?”
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Web archives are science fiction.
Web archives are enabling a reality, as
foreseen by PKD and other sci-fi authors,
where we can insert bespoke fakes
into our collective memory.
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Web archives are like science fiction
because they’re a paradox:
We need a significant and continuous
technology investment today to be able to
say a page “used to look like this.”
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Web archiving is not file backup.
Backup = prevent, detect, repair changes
Web archiving = continuous change to better simulate the past
Web archiving is a simulacrum of the past
https://makeagif.com/gif/blade-runner-jf-sebastians-toys-kaiser-and-bear-AFkWpp
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
The essence of a web archive
is to modify its holdings during replay
https://web.archive.org/web/19970626040823/http://www.drexel.edu/
Rewrite links so they
point back in the archive
Provide archival
metadata banner
(what, when, how many)
Relatively simple for the Web
of 1997. Today, it’s not so easy.
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Or modify your environment to
better emulate the past
http://oldweb.today/ie4/19970101000000/www.drexel.edu
“Yo Dawg, I heard you like browsers…” https://imgflip.com/i/3rtfws
A browser inside
the browser, in this
case IE4 for
windows (typical for
1997). Network
requests trapped &
transformed instead
of pages.
Archival metadata
panel
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Some modifications are to make yesterday’s
formats safe for / available to today’s browser
http://www.dlib.org/dlib/january05/rosenthal/01rosenthal.html
Cf. https://techcrunch.com/2017/07/25/get-ready-to-say-goodbye-to-flash-in-2020/
http://web.archive.org/web/20100605013233/http://www.youtube.com/watch?v=1aPPSIDr3Mc&feature=player_embedded/
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Web archive software is continuously evolving, in part to
better realize a more authentic version of the past
https://github.com/internetarchive/wayback/releases
https://github.com/webrecorder/pywb/releases
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
"...the government presented testimony from the office
manager of the Internet Archive, who explained how the
Archive captures and preserves evidence of the contents
of the internet at a given time. The witness also compared
the screenshots sought to be admitted with true and
accurate copies of the same websites maintained in the
Internet Archive, and testified that the screenshots were
authentic and accurate copies of the Archive’s records.
Based on this testimony, the district court found that the
screenshots had been sufficiently authenticated."
https://law.justia.com/cases/federal/appellate-courts/ca2/17-2479/17-2479-2018-07-02.html
Evidentiary use of “screenshots” of archived pages
United States v. Gasperini, No. 17-2479 (2d Cir. 2018)
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Evidentiary use of “screenshots” of archived pages
United States v. Gasperini, No. 17-2479 (2d Cir. 2018)
"...the government presented testimony from the office
manager of the Internet Archive, who explained how the
Archive captures and preserves evidence of the contents
of the internet at a given time. The witness also compared
the screenshots sought to be admitted with true and
accurate copies of the same websites maintained in the
Internet Archive, and testified that the screenshots were
authentic and accurate copies of the Archive’s records.
Based on this testimony, the district court found that the
screenshots had been sufficiently authenticated."
https://law.justia.com/cases/federal/appellate-courts/ca2/17-2479/17-2479-2018-07-02.html
Screenshots matching IA’s records are not the
same thing as IA’s records matching the past…
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
So why is it so hard to recreate the past?
If we just had isolated, static pages
(jpegs, pdfs, mp3s, etc.)
then there’d be no problem.
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
personalization
Javascript
(modifying the page)
embedded resources
(possibly including other
HTML pages via iframes)
links
Real HTML pages are complex
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Javascript
is why we can’t have nice (archival) things
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Load the archived page, get an eagle
https://www.webharvest.gov/congress112th/20130119060624/http://www.fws.gov/
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Hit “reload”, get a tiger
https://www.webharvest.gov/congress112th/20130119060624/http://www.fws.gov/
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Hit “reload” again, get a mountain
https://www.webharvest.gov/congress112th/20130119060624/http://www.fws.gov/
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
“I've done questionable things.”
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Actually, the fws.gov example was super easy;
most changes are much harder to trace
Mohamed Aturban, unpublished, memento:
http://web.archive.org/web/20130724144801/http://www.cnn.com/
Animated GIF: https://blog.dshr.org/2017/11/keynote-at-pacific-neighborhood.html
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Embedded resources + Javascript =
Our simulation of what CNN.com looked
like then is flawed.
It will never be 2013 again, so in some
sense that page is lost.
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Zombies: live web “leaking” into an archived page
http://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html
this page is
from 2008
this ad is
from 2012
(when this
screen shot
was taken)
As of late 2017, zombies
mostly no longer occur
https://blog.dshr.org/2017/09/attacking-users-of-wayback-machine.html
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Temporal violations: reconstructing legitimately
archived resources into a page that never existed
http://ws-dl.blogspot.com/2015/12/2015-12-08-evaluating-temporal.html
text (2004-12)
says rain,
image (2005-09)
is clear
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Incorrectly replaying the 2004 weather forecast for
Varina, Iowa is hardly the stuff of dystopian cyberpunk.
But there are cases where temporal violations begin to look like tampering…
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Remember the case of Joy Reid’s blog?
https://www.odu.edu/news/2018/5/michael_nelson
https://twitter.com/DrDanetteAllen/status/990228054952865793
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
https://twitter.com/phonedude_mln/status/990054945457147904
HTML archived
on 2006-01-11
JS archived
on 2006-02-07
Reid was a prolific blogger,
so a gap of nearly a month
is catastrophic for temporal
integrity.
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Not always Javascript – cookies causes the web archive to store
the Urdu language page at the URL for the English page
https://ws-dl.blogspot.com/2018/03/2018-03-21-cookies-are-why-your.html
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
https://ws-dl.blogspot.com/2019/03/2019-03-18-cookie-violations-cause.html
Cookies + Javascript =
A combo Urdu / Portuguese / English page that never existed
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Crawling & ingest errors could be exploited to
amplify an existing disinformation narrative
https://web.archive.org/web/20180302184025/https:/twitter.com/Emma4Change
“only a ‘crisis actor’ would
tweet in Slovak!”
Now imagine she gets fed up,
deletes her account, and then
someone applies the
“abandoned acct / archive” attack
Justin Littman described:
https://gwu-libraries.github.io/sfm-ui/posts/2017-11-06-vulnerabilities
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Web archives are unreliable narrators.
Unreliable narrators cause us to question
everything we’ve been told.
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Let’s prove Lester Holt did not “fudge the tape”!
https://twitter.com/AaronBlake/status/1035124642456002565https://twitter.com/realDonaldTrump/status/1035120511259500544
https://news.vice.com/en_us/article/ne5x3d/trump-lester-holt-james-comey-nbc
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
The May, 2017 NBC interview
is not archived until August, 2018
(and even then, the video itself is not archived)
https://www.nbcnews.com/nightly-news/video/pres-trump-s-extended-exclusive-interview-with-lester-holt-at-the-white-house-941854787582?v=raila
https://web.archive.org/web/*/https://www.nbcnews.com/nightly-news/video/pres-trump-s-extended-exclusive-interview-with-lester-holt-at-the-white-house-941854787582?v=raila
https://web.archive.org/web/20180825094239/https://www.nbcnews.com/nightly-news/video/pres-trump-s-extended-exclusive-interview-with-lester-holt-at-the-white-house-941854787582?v=raila
Clicking through to the video reveals a loop of postal
carrier slipping on ice; not the Lester Holt interview.
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Errors in crawling and playback are hard
to distinguish from tampering
https://twitter.com/katestarbird/status/911257133231910913
https://er.educause.edu/articles/2018/10/managing-the-cultural-record-in-the-information-warfare-era
I want to explicitly note here the difference between the
act of quietly rewriting the record and enjoying the results
of the rewrites that are accepted as truth and that of
deliberately destroying the confidence of the public
(including the scholarly community) by creating compromise,
confusion, and ambiguity to suggest that the record cannot
be trusted.
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Disinformation applied to web archives
doesn’t necessarily mean you have to
insert a specific narrative into the archive.
You just need to cast doubt on the
archive as our collective memory.
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
note:
“…both the live Web and the Wayback
Machine [...] are reasonably reliable for
everyday use”
https://blog.dshr.org/2020/03/guest-post-michael-nelsons-response.html
https://ws-dl.blogspot.com/2020/03/2020-03-07-at-nexus-of-cni-keynote-and.html
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
We’re unaware of any cases where web
archive content has been hacked or faked
for any substantive goal.
However, web archives are not immune.
It’s just the theater of conflict has yet to
expand to include web archives.
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Twitter then and now
http://inventorspot.com/articles/top_ten_twitterati_tweet_above_rest_31806
https://www.vox.com/policy-and-politics/2017/10/19/16504510/ten-gop-twitter-russia
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Facebook then and now
https://twitter.com/Pinboard/status/975013825010458624
https://web.archive.org/web/20090722095954/http://facebook.com/zuck
See also: https://www.businessinsider.com/facebook-old-posts-mark-zuckerberg-disappeared-2019-3
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Gmail then and now
http://googlepress.blogspot.com/2004/04/google-gets-message-launches-gmail.html
https://www.avanan.com/resources/gmail-exploit-allows-dnc-email-attack
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Web archives then and soon
https://web.archive.org/web/20020601134105/http://www.businessweek.com/technology/content/feb2002/tc20020228_1080.htm
?
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Why do we expect things to be different
for web archives?
Our trust model for web archives is still
rooted in the 1980s / early 90s.
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
My chronology with Unix
Late 80s: 1 computer, many users
Used an X terminal to access Cray, Convex supercomputers
90s: 1 computer, 1 user
My Sun IPX workstation was the first www.larc.nasa.gov
now: many computers, 1 user
I’m not even sure how many computers I have access to
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
From brewster@wais.com Sun Apr 25 00:03:19
1993
Received: from express.larc.nasa.gov by
blearg.larc.nasa.gov with SMTP
(5.65.2/server2.4) id AA28277; Sun,
25 Apr 93 00:00:26 -0400
Received: from wais.wais.com by
express.larc.nasa.gov with SMTP id BA21157
(SMTP/Lite-1.15) for
<m.l.nelson@larc.nasa.gov>; Sun, 25 Apr 93
00:00:20 -0400
Received: by wais.wais.com (4.1/SMI-
4.1/Brent-911016)
id AA14369; Sat, 24 Apr 93 20:47:54
PDT
Date: Sat, 24 Apr 93 20:47:54 PDT
Message-Id:
<9304250347.AA14369@wais.wais.com>
From: Brewster Kahle <brewster@wais.com>
To: abc@concert.net
To: admin@ds.internic.net
To: akers@fiddle.oit.unc.edu
To: anders@ifi.uio.no
To: anders@munin.ub2.lu.se
…
To: m.l.nelson@LaRC.NASA.GOV
…
To: root@ds.internic.net
To: root@ncgia.ucsb.edu
To: root@fiddle.oit.unc.edu
To: root@oac.hsc.uth.tmc.edu
To: root@samba.acs.unc.edu
To: root@spk41.usace.mil
To: root@stone.ucs.indiana.edu
To: root@sunsite.unc.edu
To: root@uniwa.uwa.oz.au
To: root@uva.ci.uv.es
To: root@nic.funet.fi
…
WAIS server maintainers,
As you probably know through wais-discussion, we are announcing
the commercial WAIS server this thursday. There is a big press
event and showcase at the WAIS Inc offices.
Thank you, everyone, for making it possible for us to pull off a
startup company.
We are considering running a special price for a limited time for
those that know and understand WAIS already. We would like to
discuss this with those that might be interested in it, and would
like to help us determine how it should work. Most people will
continue to use the freeware, and that is fine, this is for those
that might be interested in a commercial version. At this time,
we will not be discussing the differences between things or other
products.
Given that the press has started to call and ask for information
before hand (to scoop this story, you know the press...), we have
had to keep a very quiet profile.
On the other hand, we need the help from all of you. Generally,
this is done with a signed non-disclosure basis, but this wont
work on the Internet and not in time.
What I was thinking was to ask anyone that would like to discuss
this, to send an "email non-disclosure" to non-disclosed-
waisites-request@wais.com.
I wish this weren't so baroque, but you could not believe some of
the members of the press I have talked to. If one reporter
publishes early, it can spoil things (and get it wrong).
(please dont email to me. At this point, my cup floweth over. I
will dig out after the showcase!)
-brewster
TMC->WAIS Inc->AOL->Alexa->IA
https://twitter.com/phonedude_mln/status/1105160308866338816
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
From brewster@wais.com Sun Apr 25 00:03:19
1993
Received: from express.larc.nasa.gov by
blearg.larc.nasa.gov with SMTP
(5.65.2/server2.4) id AA28277; Sun,
25 Apr 93 00:00:26 -0400
Received: from wais.wais.com by
express.larc.nasa.gov with SMTP id BA21157
(SMTP/Lite-1.15) for
<m.l.nelson@larc.nasa.gov>; Sun, 25 Apr 93
00:00:20 -0400
Received: by wais.wais.com (4.1/SMI-
4.1/Brent-911016)
id AA14369; Sat, 24 Apr 93 20:47:54
PDT
Date: Sat, 24 Apr 93 20:47:54 PDT
Message-Id:
<9304250347.AA14369@wais.wais.com>
From: Brewster Kahle <brewster@wais.com>
To: abc@concert.net
To: admin@ds.internic.net
To: akers@fiddle.oit.unc.edu
To: anders@ifi.uio.no
To: anders@munin.ub2.lu.se
…
To: m.l.nelson@LaRC.NASA.GOV
…
To: root@ds.internic.net
To: root@ncgia.ucsb.edu
To: root@fiddle.oit.unc.edu
To: root@oac.hsc.uth.tmc.edu
To: root@samba.acs.unc.edu
To: root@spk41.usace.mil
To: root@stone.ucs.indiana.edu
To: root@sunsite.unc.edu
To: root@uniwa.uwa.oz.au
To: root@uva.ci.uv.es
To: root@nic.funet.fi
…
WAIS server maintainers,
As you probably know through wais-discussion, we are announcing
the commercial WAIS server this thursday. There is a big press
event and showcase at the WAIS Inc offices.
Thank you, everyone, for making it possible for us to pull off a
startup company.
We are considering running a special price for a limited time for
those that know and understand WAIS already. We would like to
discuss this with those that might be interested in it, and would
like to help us determine how it should work. Most people will
continue to use the freeware, and that is fine, this is for those
that might be interested in a commercial version. At this time,
we will not be discussing the differences between things or other
products.
Given that the press has started to call and ask for information
before hand (to scoop this story, you know the press...), we have
had to keep a very quiet profile.
On the other hand, we need the help from all of you. Generally,
this is done with a signed non-disclosure basis, but this wont
work on the Internet and not in time.
What I was thinking was to ask anyone that would like to discuss
this, to send an "email non-disclosure" to non-disclosed-
waisites-request@wais.com.
I wish this weren't so baroque, but you could not believe some of
the members of the press I have talked to. If one reporter
publishes early, it can spoil things (and get it wrong).
(please dont email to me. At this point, my cup floweth over. I
will dig out after the showcase!)
-brewster
When computers were $$$, an email to “root” could be expected
to be received by someone entrusted with the necessary $$$
to responsibly administer the machine.
IOW, “root” was almost always a white hat.
It hasn’t been like that for a long time.
Web archives are like the Unix mainframes of today.
TMC->WAIS Inc->AOL->Alexa->IA
https://twitter.com/phonedude_mln/status/1105160308866338816
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
How well do you know root@archive.org?
As in, could you call/email him right now and expect a response?
Our entire national digital preservation strategy is predicated on
Brewster Kahle “not being evil”™
If he is leading a 25+ year sleeper cell, we’re doomed.
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
How well do you know these roots?
Many more: https://en.wikipedia.org/wiki/List_of_Web_archiving_initiatives
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Up until now, we’ve only looked at failures
or edge cases in crawling and replay.
What about deliberate fakes?
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Cut-n-paste / mashup “fakes” for humor
Victorian Photo Collage
https://www.metmuseum.org/exhibitions/listings/2010/victorian-photocollage
“The Flying Saucer” (1956)
https://en.wikipedia.org/wiki/The_Flying_Saucer_(song)
https://www.youtube.com/watch?v=XCrn6QXvHLg
Brian Williams Raps ‘Gin & Juice’
https://www.youtube.com/watch?v=XlGLhYFrv6w
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
We’ve always had “fakes”, the most convincing of
which require significant skills, knowledge, and
physical access
https://en.wikipedia.org/wiki/Piltdown_Man
https://en.wikipedia.org/wiki/Shroud_of_Turin
https://www.npr.org/templates/story/story.php?storyId=94461486
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
“deep learning” + “fake” = deepfakes
https://motherboard.vice.com/en_us/article/7x799b/selling-ai-generated-fake-porn-is-probably-a-good-way-to-get-sued
https://motherboard.vice.com/en_us/article/ev5eba/ai-fake-porn-of-friends-deepfakes
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Becoming more mainstream:
https://twitter.com/MikaelThalen/status/1090349932266094593 https://deepfakesapp.online/
A “safe for work” example:
No longer buried in
the dark corners of Reddit:
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Not just porn or novelties: just better
revoicing?
“Deepfake technology has
helped us scale campaign
efforts like never before,”
Neelkant Bakshi, co-incharge
of social media and IT for BJP
Delhi, tells VICE. “The Haryanvi
videos let us convincingly
approach the target audience
even if the candidate didn’t
speak the language of the voter.”
https://www.vice.com/en_in/article/jgedjb/the-first-use-of-deepfakes-in-indian-election-by-bjp
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
That deepfakes are even possible forces
us to recalibrate our information literacy
https://www.motherjones.com/politics/2019/03/deepfake-gabon-ali-bongo/
“But when Gabon’s
government
released the video, it raised
more questions than it
answered. ... One week after
the video’s release, Gabon’s
military attempted an ultimately
unsuccessful coup— the
country’s first since 1964—
citing the video’s oddness as
proof something was amiss
with the president.”
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Or perhaps deepfakes are just the next Photoshop,
and skepticism vs. gullibility is more about us
and not of the media itself
If a fake email can
cause this much impact,
then it is fair to ask are
deepfakes and fake
provenance via web
Archives even necessary?
https://twitter.com/i/events/1230892474140430337
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
“Detecting” deepfakes will happen.
“Preventing” deepfakes won’t happen; they’re here to stay:
Mementos, even of a fake past, are core to the human condition.
“Did you get your precious photos?” “Implants. Those aren't your memories,
they're somebody else's. They're
Tyrell's niece's.”
http://deepemotions.free.fr/theme_1.html
Real photos, fake memories: replicants attach significant value to photos,
even when they know the memories are fake.
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Next Thanksgiving dinner,
liven up the discussion with your extended family
1. Extract just 0:23—0:26 of the
Obama/Peele video
2. Embed in an HTML page
3. Use Javascript to rewrite the
banner and browser URL
– Datetime: 2016-11-09
– URL:
www.whitehouse.gov/totally
NotFake
4. Claim the deep state deleted
the page from the live web
https://www.theverge.com/tldr/2018/4/17/17247334/ai-fake-news-video-barack-obama-jordan-peele-buzzfeed
https://www.youtube.com/watch?time_continue=43&v=cQ54GDm1eL0#t=0m23s
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Archives have vulnerabilities.
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Inserting fakes into real archives
Here’s an actual page in the IA “proving”
Brian Williams released “Gin and Juice” in 1992, a full year before Snoop Dogg.
John Berlin, MS Thesis, 2018
https://www.youtube.com/watch?v=k3QTcJZdFfs
(actual URI-R & URI-M have also been obscured in the video to hide the technique)
The content is clearly fake, but it demonstrates that it’s possible
to write Javascript that attacks the archive’s playback capability.
It takes an archiving expert to tell the difference.
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
We’ve known about these & other attacks
since 2017
http://labs.rhizome.org/presentations/security.html#/
https://acmccs.github.io/papers/p1741-lernerAT3.pdf
https://blog.dshr.org/2017/06/wac2017-security-issues-for-web-archives.html
https://ws-dl.blogspot.com/2018/04/2018-05-01-high-fidelity-ms-thesis-to.html
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
There are other ways, presumably still
hypothetical, to attack the archives
https://twitter.com/internetarchive/status/596768668756774914
https://xkcd.com/538/
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
https://www.theguardian.com/uk-news/2018/sep/05/planes-trains-and-fake-names-the-trail-left-by-skripal-suspects
https://www.cnn.com/2018/10/22/middleeast/saudi-operative-jamal-khashoggi-clothes/index.html
“Planes, trains and fake names:
the trail left by Skripal suspects”
“Surveillance footage shows
Saudi 'body double' in
Khashoggi's clothes after he was
killed, Turkish source says”
Before you say “that will never happen!”
Reminder: agents, dissidents, journalists have all disappeared;
they won’t mind adding a librarian/sysadmin to the list
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
I’ve got good news and bad news:
Setting up a web archive is not as difficult
nor expensive as it used to be.
OpenWayback, WAIL, pywb, et al. + cloud storage =
you can have a web archive running in about the same time
it took to generate the Steve Buscemi / Jennifer Lawrence deepfake.
https://github.com/iipc/openwayback
https://github.com/N0taN3rd/wail
https://machawk1.github.io/wail/
https://github.com/webrecorder/pywb
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Inserting fakes into fake archives
breitbart.com/wayback/*/whitehouse.gov/totallyNotFake
infowars.com/web/*/whitehouse.gov/totallyNotFake
iluv.aynrand.org/*/whitehouse.gov/totallyNotFake
InternetResearchAgency.ru/whitehouse.gov/totallyNotFake
How well do you know root at these archives?
Are they really four different archives, or one root for all of them?
What if 99.9% of the time they faithfully replay pages?
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
http://www.dlib.org/dlib/november05/rosenthal/11rosenthal.html
What if we start off with > (n/2)+1
archives compromised?
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
What if the archives were targeted to amplify a
specific disinformation narrative?
And what if the archives had no choice but to
cooperate?
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
The University of Farmington is fake
DHS strong armed a “.edu” registration, they could do the same to IA & others too
https://twitter.com/nwarikoo/status/1090726638034276352
https://web.archive.org/web/20161023170733/https://universityoffarmington.edu/
https://twitter.com/phonedude_mln/status/1092464939040755712
First capture: 2016-10-23
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Blockchain to the rescue!!!
<lasers>
<sirens>
<disco-thumping-soundtrack>
nope.
https://www.multichain.com/blog/2015/11/avoiding-pointless-blockchain-project/
https://eprint.iacr.org/2017/375.pdf
https://blog.dshr.org/search/label/bitcoin
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
There is no shortage of
deepfake vs. blockchain stories
https://www.wired.com/story/the-blockchain-solution-to-our-deepfake-problems/
https://www.longhash.com/news/the-coming-war-between-deepfakes-and-blockchain
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
A Voight-Kampff Test for deepfakes
now seems around the corner
https://twitter.com/TechCrunch/status/1009556795965296642
https://www.technologyreview.com/s/611726/the-defense-department-has-produced-the-first-tools-for-catching-deepfakes/
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Are we prepared for the
unintended consequences?
“Enforcing digital signatures for all
cameras and video devices would
offer the same capability in reverse.
Suddenly every photograph and video
shared online could be traced back to
its original owner. Security services in
a repressive regime could scour social
media for all videos depicting them in
a negative light and trace them back
to the precise individuals who captured
the video, arresting them en masse.”
https://www.forbes.com/sites/kalevleetaru/2018/09/09/why-digital-signatures-wont-prevent-deep-fakes-but-will-help-repressive-governments/
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
On the other hand, “blockchaining” our pets is a study in
incompatibility, so tracking photos may never happen
https://www.aspca.org/about-us/aspca-policy-and-position-statements/microchips
https://moviepaws.com/2017/10/22/owls-snakes-and-unicorns-the-animals-of-blade-runner/
In Blade Runner, synthetic pets
have serial numbers
(real pets are unavailable
to all but the richest).
“While most of the world has
accepted these standards,
North America has not. The
primary problem is a competitive,
technological one involving the
compatibility of the microchips
and the readers that are used
by shelters and veterinary clinics.”
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
As for blockchains and web archives…
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
This is not what you think it is…
https://petertodd.org/2017/carbon-dating-the-internet-archive-with-opentimestamps
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
This is not what you think it is…
https://petertodd.org/2017/carbon-dating-the-internet-archive-with-opentimestamps
“…right now you can get timestamps for every book,
movie, song, computer program, legal document,
etc. in the thousands of collections in the archive.
In the future we hope to be able to work with the
Internet Archive to extend this to timestamping
website snapshots…”
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
That’s never going to happen.
(at least not 3rd party through the playback interface)
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Archives Aren’t Magic Web Sites
They’re Just Web Sites.
https://ws-dl.blogspot.com/2019/08/2019-08-30-where-did-archive-go-part1.html
https://ws-dl.blogspot.com/2019/09/2019-09-10-where-did-archive-go-part-2.html
https://ws-dl.blogspot.com/2019/09/2019-09-25-where-did-archive-go-part-3.html
https://ws-dl.blogspot.com/2019/10/2019-10-21-where-did-archive-go-part-4.html
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Archive URI-Ms
-----------------------------
perma-archives.org 182
bibalex.org 199
webarchive.org.uk 349
bac-lac.gc.ca 351
proni.gov.uk 469
digar.ee 488
webharvest.gov 712
internetmemory.org 979
nationalarchives.gov.uk 994
stanford.edu 1222
archive-it.org 1383
archive.is 1396
web.archive.org 1566
arquivo.pt 1569
webcitation.org 1585
vefsafn.is 1589
loc.gov 1594
-----------------------------
Total 16627
Sample 16k+ Mementos from 17 Web Archives
https://arxiv.org/abs/1905.03836
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Periodically Replay Each Archived Page
Above example: http://perma-archives.org/warc/20170101182813/http://umich.edu/
35 times, from Nov. 2017 – Oct. 2018
For each replay, we download both the rewritten version and the “raw” version (where possible).
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Periodically Replay Each Archived Page
Above example: http://perma-archives.org/warc/20170101182813/http://umich.edu/
35 times, from Nov. 2017 – Oct. 2018
For each replay, we download both the rewritten version and the “raw” version (where possible).
Partial archive outage because
of security / maintenance upgrade
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Periodically Replay Each Archived Page
Above example: http://perma-archives.org/warc/20170101182813/http://umich.edu/
35 times, from Nov. 2017 – Oct. 2018
For each replay, we download both the rewritten version and the “raw” version (where possible).
Post-upgrade, replay is variable.
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
More Archived Pages Changed Every Time
Than Never Changed
(yes, this experiment used “raw” mode)
Never changed:
2007 URI-Ms (1 in 8)
Always changed:
2773 URI-Ms (1 in 6)
Fixity-based approaches, including blockchain, will not work.
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
“Hash the screen shot, not the HTML!”
That doesn’t work either.
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
1 WARC file, 2 Wayback Machines, 3 Browsers
= 6 different replays
http://wayback.archive-it.org/all/20130106140348/http://www.harvard.edu/
http://web.archive.org/web/20130106140348/http://www.harvard.edu/
see also: https://ws-dl.blogspot.com/2016/12/2016-12-20-archiving-pages-with.html
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Why not create a LOCKSS for web archives?
“The tortoise lays on its back, its belly baking in the hot sun, beating its legs trying
to turn itself over, but it can’t, not without your help. But you’re not helping. Why is that?”
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Web archives are not especially interoperable.
There are many issues regarding
interoperability, but generational loss is a good
demonstration of incompatible assumptions
about simulating the past.
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
https://web.archive.org/web/20180501125952/https:/twitter.com/phonedude_mln/status/990054945457147904
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
http://archive.is/PaKx6
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
https://perma.cc/3HMS-TB59
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
http://www.webcitation.org/77RhNeyoZ
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
https://web.archive.org/web/20190407024654/https://perma.cc/3HMS-TB59
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
https://web.archive.org/web/20190407031659/http://www.webcitation.org/77RhNeyoZ
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Web archiving interoperability: a metaphor
(non-synthetic pets, possibly microchipped)
https://www.youtube.com/watch?v=SQudKvrwDAU
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
To summarize:
Existing, trusted archives can be compromised by:
1) crawling malicious pages, or
2) attacking facilities / personnel
3) court orders
Lowered resource threshold for archives allows:
1) “long game” archives: faithful now, corrupt later,
2) “sock puppet” archives: surreptitiously cooperating
archives
The nature of web archives is to change content –
current fixity based approaches will not help.
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Looking forward:
We need new models for web archiving
and verifying authenticity.
The Heritrix / Wayback Machine
technology stack, while successful, has
limited our thinking.
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
“Studies generally suggest that, year after year, less than 60
percent of web traffic is human; … For a period of time in
2013, the Times reported this year, a full half of YouTube
traffic was “bots masquerading as people,” a portion so high
that employees feared an inflection point after which
YouTube’s systems for detecting fraudulent traffic would
begin to regard bot traffic as real and human traffic as fake.
They called this hypothetical event “the Inversion.””
http://nymag.com/intelligencer/2018/12/how-much-of-the-internet-is-fake.html
In the IA: robots outnumber humans 10:1 in sessions, 5:4 in HTTP connections, ca. 2012
http://arxiv.org/abs/1309.4016
https://giphy.com/gifs/harrison-ford-blade-runner-sean-young-yjB2fwqjv5rry/media
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
I suspect the core of the new model will have a lot
in common with click farms
https://twitter.com/mbrennanchina/status/1072114511212109824
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Record what we saw at crawl time as a baseline,
then we need a distance measure for crawl time and replay time
http://dx.doi.org/10.5210/fm.v22i112.8097
https://ws-dl.blogspot.com/2013/05/2013-05-25-game-walkthroughs-as.html
Documenting instead of archiving…
1)Robotic witnesses
2)New Nielsen families
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
Some of you might be thinking
“but I don’t like Blade Runner – what can I take
away from this talk?”
(my wife refers to the film as “serious white guys talking”)
Two methods for passing the
Voight-Kampff Test for Blade Runner fandom
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
1) Is Deckard a replicant?
In the book, he’s definitely human. In the seven (!) versions of
the movie, it ranges from “ambiguous” to “replicant”.
https://moviepaws.com/2017/10/22/owls-snakes-and-unicorns-the-animals-of-blade-runner/
https://en.wikipedia.org/wiki/Themes_in_Blade_Runner
https://en.wikipedia.org/wiki/Blade_Runner#Versions (Hello, FRBR)
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
2) “Tears in Rain” – Greatest monologue in sci-fi?
Or greatest monologue of all time?
I've seen things you people wouldn't believe. Attack ships on fire
off the shoulder of Orion. I watched C-beams glitter in the dark
near the Tannhäuser Gate. All those moments will be lost in time,
like tears in rain. Time to die.
https://www.youtube.com/watch?v=9hDo80ddn4Q
https://en.wikipedia.org/wiki/Tears_in_rain_monologue
https://www.youtube.com/watch?v=BM54jXndyvQ
Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09
@phonedude_mln, @WebSciDL
2) “Tears in Rain” – Greatest monologue in sci-fi?
Or greatest monologue of all time?
I've crawled things you people wouldn't believe. Clickjacking attacks
off the x-frame-options: sameorigin. I watched ajax requests redirect
at the aggregator TimeGate. All those pages will be lost in time,
like tears in rain. Time to lie.
https://www.youtube.com/watch?v=9hDo80ddn4Q
https://en.wikipedia.org/wiki/Tears_in_rain_monologue
https://www.youtube.com/watch?v=BM54jXndyvQ

More Related Content

What's hot

Open Education Leadership: National Trends & Best Practices
Open Education Leadership: National Trends & Best PracticesOpen Education Leadership: National Trends & Best Practices
Open Education Leadership: National Trends & Best PracticesNicole Allen
 
Personal Learning Networks
Personal Learning NetworksPersonal Learning Networks
Personal Learning NetworksFloyd Pentlin
 
Web20 School20 4ss
Web20 School20 4ssWeb20 School20 4ss
Web20 School20 4ssdwarlick
 
The State of Digital Pedagogy: The Intersection of Networks & Institutions
The State of Digital Pedagogy: The Intersection of Networks & InstitutionsThe State of Digital Pedagogy: The Intersection of Networks & Institutions
The State of Digital Pedagogy: The Intersection of Networks & InstitutionsBonnie Stewart
 
The Information Revolution
The Information RevolutionThe Information Revolution
The Information Revolutionrpop1012
 
Exploring Digital Cultures W12: The Wikipedia Debate
Exploring Digital Cultures W12: The Wikipedia DebateExploring Digital Cultures W12: The Wikipedia Debate
Exploring Digital Cultures W12: The Wikipedia DebateNoNeedforInk
 
@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015Michael Nelson
 
2012bruneluniversity 120615094336-phpapp01
2012bruneluniversity 120615094336-phpapp012012bruneluniversity 120615094336-phpapp01
2012bruneluniversity 120615094336-phpapp01Maria Palaska
 
If you love your content, set it free (v3.0)
If you love your content, set it free (v3.0) If you love your content, set it free (v3.0)
If you love your content, set it free (v3.0) Mike Ellis
 
Learning Futures: How new & emerging technologies will impact learning and de...
Learning Futures: How new & emerging technologies will impact learning and de...Learning Futures: How new & emerging technologies will impact learning and de...
Learning Futures: How new & emerging technologies will impact learning and de...Learning Pool Ltd
 
Technology Tools In The Classroom: Using Computers To Engage Your Students
Technology Tools In The Classroom:  Using Computers To Engage Your StudentsTechnology Tools In The Classroom:  Using Computers To Engage Your Students
Technology Tools In The Classroom: Using Computers To Engage Your Studentsforestfortrees
 
Why Do Students Plagiarise?
Why Do Students Plagiarise?Why Do Students Plagiarise?
Why Do Students Plagiarise?Cathy Oxley
 
Students Worlds Cff 4ss
Students Worlds Cff 4ssStudents Worlds Cff 4ss
Students Worlds Cff 4ssdwarlick
 
Classrooms Of The Future Presentaiton
Classrooms Of The Future PresentaitonClassrooms Of The Future Presentaiton
Classrooms Of The Future Presentaitoncarteramsv
 
Classrooms Of The Future Presentaiton
Classrooms Of The Future PresentaitonClassrooms Of The Future Presentaiton
Classrooms Of The Future Presentaitoncarteramsv
 
MicroLearning 2007
MicroLearning 2007MicroLearning 2007
MicroLearning 2007David Smith
 
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Shawn Jones
 
Digital etiquette
Digital etiquetteDigital etiquette
Digital etiquettepcooney1
 

What's hot (20)

Open Education Leadership: National Trends & Best Practices
Open Education Leadership: National Trends & Best PracticesOpen Education Leadership: National Trends & Best Practices
Open Education Leadership: National Trends & Best Practices
 
Personal Learning Networks
Personal Learning NetworksPersonal Learning Networks
Personal Learning Networks
 
Web20 School20 4ss
Web20 School20 4ssWeb20 School20 4ss
Web20 School20 4ss
 
The State of Digital Pedagogy: The Intersection of Networks & Institutions
The State of Digital Pedagogy: The Intersection of Networks & InstitutionsThe State of Digital Pedagogy: The Intersection of Networks & Institutions
The State of Digital Pedagogy: The Intersection of Networks & Institutions
 
The Information Revolution
The Information RevolutionThe Information Revolution
The Information Revolution
 
Exploring Digital Cultures W12: The Wikipedia Debate
Exploring Digital Cultures W12: The Wikipedia DebateExploring Digital Cultures W12: The Wikipedia Debate
Exploring Digital Cultures W12: The Wikipedia Debate
 
@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015
 
2012bruneluniversity 120615094336-phpapp01
2012bruneluniversity 120615094336-phpapp012012bruneluniversity 120615094336-phpapp01
2012bruneluniversity 120615094336-phpapp01
 
If you love your content, set it free (v3.0)
If you love your content, set it free (v3.0) If you love your content, set it free (v3.0)
If you love your content, set it free (v3.0)
 
Wizard of Apps Revised
Wizard of Apps RevisedWizard of Apps Revised
Wizard of Apps Revised
 
NCAGT Wikipedia
NCAGT WikipediaNCAGT Wikipedia
NCAGT Wikipedia
 
Learning Futures: How new & emerging technologies will impact learning and de...
Learning Futures: How new & emerging technologies will impact learning and de...Learning Futures: How new & emerging technologies will impact learning and de...
Learning Futures: How new & emerging technologies will impact learning and de...
 
Technology Tools In The Classroom: Using Computers To Engage Your Students
Technology Tools In The Classroom:  Using Computers To Engage Your StudentsTechnology Tools In The Classroom:  Using Computers To Engage Your Students
Technology Tools In The Classroom: Using Computers To Engage Your Students
 
Why Do Students Plagiarise?
Why Do Students Plagiarise?Why Do Students Plagiarise?
Why Do Students Plagiarise?
 
Students Worlds Cff 4ss
Students Worlds Cff 4ssStudents Worlds Cff 4ss
Students Worlds Cff 4ss
 
Classrooms Of The Future Presentaiton
Classrooms Of The Future PresentaitonClassrooms Of The Future Presentaiton
Classrooms Of The Future Presentaiton
 
Classrooms Of The Future Presentaiton
Classrooms Of The Future PresentaitonClassrooms Of The Future Presentaiton
Classrooms Of The Future Presentaiton
 
MicroLearning 2007
MicroLearning 2007MicroLearning 2007
MicroLearning 2007
 
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
 
Digital etiquette
Digital etiquetteDigital etiquette
Digital etiquette
 

Similar to Web Archives at the Nexus of Good Fakes and Flawed Originals

Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Michael Nelson
 
Big Data, Linked Data
Big Data, Linked DataBig Data, Linked Data
Big Data, Linked DataPaul Miller
 
RBMS LODLAM presentation
RBMS LODLAM presentationRBMS LODLAM presentation
RBMS LODLAM presentationJon Voss
 
MD 400 Introduction
MD 400 IntroductionMD 400 Introduction
MD 400 Introductionjjh3810
 
Nordkapp dConstruct09 Recap
Nordkapp dConstruct09 RecapNordkapp dConstruct09 Recap
Nordkapp dConstruct09 RecapTeppo Kotirinta
 
Linking Open Government Data
Linking Open Government DataLinking Open Government Data
Linking Open Government Data3 Round Stones
 
Bridging the Gap Between Print and Digital Environment
Bridging the Gap Between Print and Digital EnvironmentBridging the Gap Between Print and Digital Environment
Bridging the Gap Between Print and Digital EnvironmentAnita Riley
 
Preparing for the Impact of Web 3.0
Preparing for the Impact of Web 3.0 Preparing for the Impact of Web 3.0
Preparing for the Impact of Web 3.0 Judy O'Connell
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesMichael Nelson
 
Radically Open at the National Archives
Radically Open at the National ArchivesRadically Open at the National Archives
Radically Open at the National ArchivesJon Voss
 
Finding harmony in web development
Finding harmony in web developmentFinding harmony in web development
Finding harmony in web developmentChristian Heilmann
 
185 Toefl Writing Twe) Topics And Model Essays Pdf Wiziq
185 Toefl Writing Twe) Topics And Model Essays Pdf Wiziq185 Toefl Writing Twe) Topics And Model Essays Pdf Wiziq
185 Toefl Writing Twe) Topics And Model Essays Pdf WiziqJessica Rinehart
 
RDFa From Theory to Practice
RDFa From Theory to PracticeRDFa From Theory to Practice
RDFa From Theory to PracticeAdrian Stevenson
 
Search Interfaces on the Web: Querying and Characterizing, PhD dissertation
Search Interfaces on the Web: Querying and Characterizing, PhD dissertationSearch Interfaces on the Web: Querying and Characterizing, PhD dissertation
Search Interfaces on the Web: Querying and Characterizing, PhD dissertationDenis Shestakov
 
Combining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web ArchivesCombining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web ArchivesShawn Jones
 
Linked Data and the Semantic Web - Mimas Seminar
Linked Data and the Semantic Web - Mimas SeminarLinked Data and the Semantic Web - Mimas Seminar
Linked Data and the Semantic Web - Mimas SeminarAdrian Stevenson
 
Planning Beyond Digitization: Digital Preservation for Audiovisual Collections
Planning Beyond Digitization: Digital Preservation for Audiovisual Collections Planning Beyond Digitization: Digital Preservation for Audiovisual Collections
Planning Beyond Digitization: Digital Preservation for Audiovisual Collections Kara Van Malssen
 
Ensuring a positive reception for Augmented Reality
Ensuring a positive reception for Augmented RealityEnsuring a positive reception for Augmented Reality
Ensuring a positive reception for Augmented RealityDavid Wood
 
SHADOW SELVES: LIVING WITH (OR WITHOUT) OUR BIG DATA DOUBLES
SHADOW SELVES: LIVING WITH (OR WITHOUT) OUR BIG DATA DOUBLESSHADOW SELVES: LIVING WITH (OR WITHOUT) OUR BIG DATA DOUBLES
SHADOW SELVES: LIVING WITH (OR WITHOUT) OUR BIG DATA DOUBLESChangeist
 

Similar to Web Archives at the Nexus of Good Fakes and Flawed Originals (20)

Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Big Data, Linked Data
Big Data, Linked DataBig Data, Linked Data
Big Data, Linked Data
 
RBMS LODLAM presentation
RBMS LODLAM presentationRBMS LODLAM presentation
RBMS LODLAM presentation
 
MD 400 Introduction
MD 400 IntroductionMD 400 Introduction
MD 400 Introduction
 
Nordkapp dConstruct09 Recap
Nordkapp dConstruct09 RecapNordkapp dConstruct09 Recap
Nordkapp dConstruct09 Recap
 
Linking Open Government Data
Linking Open Government DataLinking Open Government Data
Linking Open Government Data
 
Bridging the Gap Between Print and Digital Environment
Bridging the Gap Between Print and Digital EnvironmentBridging the Gap Between Print and Digital Environment
Bridging the Gap Between Print and Digital Environment
 
Preparing for the Impact of Web 3.0
Preparing for the Impact of Web 3.0 Preparing for the Impact of Web 3.0
Preparing for the Impact of Web 3.0
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Radically Open at the National Archives
Radically Open at the National ArchivesRadically Open at the National Archives
Radically Open at the National Archives
 
Finding harmony in web development
Finding harmony in web developmentFinding harmony in web development
Finding harmony in web development
 
So what if it's a bubble?
So what if it's a bubble?So what if it's a bubble?
So what if it's a bubble?
 
185 Toefl Writing Twe) Topics And Model Essays Pdf Wiziq
185 Toefl Writing Twe) Topics And Model Essays Pdf Wiziq185 Toefl Writing Twe) Topics And Model Essays Pdf Wiziq
185 Toefl Writing Twe) Topics And Model Essays Pdf Wiziq
 
RDFa From Theory to Practice
RDFa From Theory to PracticeRDFa From Theory to Practice
RDFa From Theory to Practice
 
Search Interfaces on the Web: Querying and Characterizing, PhD dissertation
Search Interfaces on the Web: Querying and Characterizing, PhD dissertationSearch Interfaces on the Web: Querying and Characterizing, PhD dissertation
Search Interfaces on the Web: Querying and Characterizing, PhD dissertation
 
Combining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web ArchivesCombining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web Archives
 
Linked Data and the Semantic Web - Mimas Seminar
Linked Data and the Semantic Web - Mimas SeminarLinked Data and the Semantic Web - Mimas Seminar
Linked Data and the Semantic Web - Mimas Seminar
 
Planning Beyond Digitization: Digital Preservation for Audiovisual Collections
Planning Beyond Digitization: Digital Preservation for Audiovisual Collections Planning Beyond Digitization: Digital Preservation for Audiovisual Collections
Planning Beyond Digitization: Digital Preservation for Audiovisual Collections
 
Ensuring a positive reception for Augmented Reality
Ensuring a positive reception for Augmented RealityEnsuring a positive reception for Augmented Reality
Ensuring a positive reception for Augmented Reality
 
SHADOW SELVES: LIVING WITH (OR WITHOUT) OUR BIG DATA DOUBLES
SHADOW SELVES: LIVING WITH (OR WITHOUT) OUR BIG DATA DOUBLESSHADOW SELVES: LIVING WITH (OR WITHOUT) OUR BIG DATA DOUBLES
SHADOW SELVES: LIVING WITH (OR WITHOUT) OUR BIG DATA DOUBLES
 

More from Michael Nelson

Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Michael Nelson
 
Summarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesSummarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesMichael Nelson
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingMichael Nelson
 
We Need Multiple, Independent Web Archives
We Need Multiple, Independent Web ArchivesWe Need Multiple, Independent Web Archives
We Need Multiple, Independent Web ArchivesMichael Nelson
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptMichael Nelson
 
Storytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesMichael Nelson
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple ArchivesMichael Nelson
 
Combining Storytelling and Web Archives
Combining Storytelling and Web ArchivesCombining Storytelling and Web Archives
Combining Storytelling and Web ArchivesMichael Nelson
 
Evaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesEvaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesMichael Nelson
 
When Should I Make Preservation Copies of Myself?
When Should I Make Preservation Copies of Myself?�When Should I Make Preservation Copies of Myself?�
When Should I Make Preservation Copies of Myself?Michael Nelson
 
Assessing the Quality of Web Archives
Assessing the Quality of Web ArchivesAssessing the Quality of Web Archives
Assessing the Quality of Web ArchivesMichael Nelson
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web ArchivesMichael Nelson
 
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench ToolEvaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench ToolMichael Nelson
 
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...Michael Nelson
 
Who and What Links to the Internet Archive
Who and What Links to the Internet ArchiveWho and What Links to the Internet Archive
Who and What Links to the Internet ArchiveMichael Nelson
 
On the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over TimeOn the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over TimeMichael Nelson
 
Profiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageProfiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageMichael Nelson
 
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web ArchivingWho Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web ArchivingMichael Nelson
 
More Archives, More Better
More Archives, More Better More Archives, More Better
More Archives, More Better Michael Nelson
 
Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member Michael Nelson
 

More from Michael Nelson (20)

Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
 
Summarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesSummarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniques
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web Archiving
 
We Need Multiple, Independent Web Archives
We Need Multiple, Independent Web ArchivesWe Need Multiple, Independent Web Archives
We Need Multiple, Independent Web Archives
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
 
Storytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web Archives
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple Archives
 
Combining Storytelling and Web Archives
Combining Storytelling and Web ArchivesCombining Storytelling and Web Archives
Combining Storytelling and Web Archives
 
Evaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesEvaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived Pages
 
When Should I Make Preservation Copies of Myself?
When Should I Make Preservation Copies of Myself?�When Should I Make Preservation Copies of Myself?�
When Should I Make Preservation Copies of Myself?
 
Assessing the Quality of Web Archives
Assessing the Quality of Web ArchivesAssessing the Quality of Web Archives
Assessing the Quality of Web Archives
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web Archives
 
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench ToolEvaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
 
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
 
Who and What Links to the Internet Archive
Who and What Links to the Internet ArchiveWho and What Links to the Internet Archive
Who and What Links to the Internet Archive
 
On the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over TimeOn the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over Time
 
Profiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageProfiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content Language
 
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web ArchivingWho Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
 
More Archives, More Better
More Archives, More Better More Archives, More Better
More Archives, More Better
 
Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member
 

Recently uploaded

Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLimonikaupta
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersDamian Radcliffe
 
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...Diya Sharma
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.soniya singh
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableSeo
 
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...Escorts Call Girls
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024APNIC
 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...singhpriety023
 
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Call Girls in Nagpur High Profile
 
CALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service Onlineanilsa9823
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...Neha Pandey
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445ruhi
 
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)Damian Radcliffe
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girladitipandeya
 

Recently uploaded (20)

Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
 
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
 
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
 
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
 
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
 
CALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Lucknow Lucknow best sexual service Online
 
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
 
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
 

Web Archives at the Nexus of Good Fakes and Flawed Originals

  • 1. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Web Archives at the Nexus of Good Fakes and Flawed Originals Michael L. Nelson Old Dominion University Web Science & Digital Libraries Research Group @WebSciDL, @phonedude_mln With: ODU: Michele C. Weigle, John Berlin, Mohamed Aturban, Sawood Alam LANL: Martin Klein, DANS: Herbert Van de Sompel Supported in part by The Andrew Mellon Foundation and the National Science Foundation
  • 2. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL "You’re in a desert walking along in the sand when all of the sudden you look down, and you see a tortoise..." Supported in part by The Andrew Mellon Foundation and the National Science Foundation Michael L. Nelson Old Dominion University Web Science & Digital Libraries Research Group @WebSciDL, @phonedude_mln With: ODU: Michele C. Weigle, John Berlin, Mohamed Aturban, Sawood Alam LANL: Martin Klein, DANS: Herbert Van de Sompel
  • 3. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL https://en.wikipedia.org/wiki/Blade_Runner National Film Registry Induction, 1993: https://www.loc.gov/loc/lcib/94/9405/film.html http://www.loc.gov/static/programs/national-film-preservation-board/documents/blade_runner.pdf 1982 1968
  • 4. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL https://www.youtube.com/watch?v=LwDdP88Dr54
  • 5. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL https://www.youtube.com/watch?v=LwDdP88Dr54
  • 6. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL We’re not going to review RS’s/PKD’s predictions https://www.cnn.com/2018/12/28/movies/blade-runner-predictions-2019-trnd/ https://twentytwowords.com/blade-runner-was-set-in-2019/ https://nwn.blogs.com/nwn/2019/01/blade-runner-los-angeles-2019.html https://www.theregister.co.uk/2019/01/01/blade_runner_today/
  • 7. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Common themes in the works of Phillip K. Dick • identity • self vs. the other • memory • humanity • authenticity • reality vs. simulacra • unreliable narrator
  • 8. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Blade Runner in 279 characters
  • 9. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Voight-Kampff Test: distinguishing authentic (humans) vs. fake (replicants) https://www.youtube.com/watch?v=ic0PuvJbdu0 You’re in a desert walking along in the sand when all of the sudden you look down, and you see a tortoise. You reach down, you flip the tortoise over on its back. The tortoise lays on its back, its belly baking in the hot sun, beating its legs trying to turn itself over, but it can’t, not without your help. But you’re not helping. Why is that?
  • 10. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Robots indistinguishable from humans, off-world slaves, perpetually “dark and stormy” Los Angeles – all good cyberpunk sci-fi tropes – but that’s not our 2019, right?
  • 11. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL The future is already here — it's just not evenly distributed. -- William Gibson (yes, I’m mixing sci-fi authors) https://twitter.com/badnetworker/status/1093864777179430912 https://geekologie.com/2018/02/boston-dynamics-tests-door-opening-robot.php
  • 12. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL “So when do we get to that part about web archiving?”
  • 13. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Web archives are science fiction. Web archives are enabling a reality, as foreseen by PKD and other sci-fi authors, where we can insert bespoke fakes into our collective memory.
  • 14. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Web archives are like science fiction because they’re a paradox: We need a significant and continuous technology investment today to be able to say a page “used to look like this.”
  • 15. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Web archiving is not file backup. Backup = prevent, detect, repair changes Web archiving = continuous change to better simulate the past Web archiving is a simulacrum of the past https://makeagif.com/gif/blade-runner-jf-sebastians-toys-kaiser-and-bear-AFkWpp
  • 16. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL The essence of a web archive is to modify its holdings during replay https://web.archive.org/web/19970626040823/http://www.drexel.edu/ Rewrite links so they point back in the archive Provide archival metadata banner (what, when, how many) Relatively simple for the Web of 1997. Today, it’s not so easy.
  • 17. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Or modify your environment to better emulate the past http://oldweb.today/ie4/19970101000000/www.drexel.edu “Yo Dawg, I heard you like browsers…” https://imgflip.com/i/3rtfws A browser inside the browser, in this case IE4 for windows (typical for 1997). Network requests trapped & transformed instead of pages. Archival metadata panel
  • 18. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Some modifications are to make yesterday’s formats safe for / available to today’s browser http://www.dlib.org/dlib/january05/rosenthal/01rosenthal.html Cf. https://techcrunch.com/2017/07/25/get-ready-to-say-goodbye-to-flash-in-2020/ http://web.archive.org/web/20100605013233/http://www.youtube.com/watch?v=1aPPSIDr3Mc&feature=player_embedded/
  • 19. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Web archive software is continuously evolving, in part to better realize a more authentic version of the past https://github.com/internetarchive/wayback/releases https://github.com/webrecorder/pywb/releases
  • 20. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL "...the government presented testimony from the office manager of the Internet Archive, who explained how the Archive captures and preserves evidence of the contents of the internet at a given time. The witness also compared the screenshots sought to be admitted with true and accurate copies of the same websites maintained in the Internet Archive, and testified that the screenshots were authentic and accurate copies of the Archive’s records. Based on this testimony, the district court found that the screenshots had been sufficiently authenticated." https://law.justia.com/cases/federal/appellate-courts/ca2/17-2479/17-2479-2018-07-02.html Evidentiary use of “screenshots” of archived pages United States v. Gasperini, No. 17-2479 (2d Cir. 2018)
  • 21. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Evidentiary use of “screenshots” of archived pages United States v. Gasperini, No. 17-2479 (2d Cir. 2018) "...the government presented testimony from the office manager of the Internet Archive, who explained how the Archive captures and preserves evidence of the contents of the internet at a given time. The witness also compared the screenshots sought to be admitted with true and accurate copies of the same websites maintained in the Internet Archive, and testified that the screenshots were authentic and accurate copies of the Archive’s records. Based on this testimony, the district court found that the screenshots had been sufficiently authenticated." https://law.justia.com/cases/federal/appellate-courts/ca2/17-2479/17-2479-2018-07-02.html Screenshots matching IA’s records are not the same thing as IA’s records matching the past…
  • 22. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL So why is it so hard to recreate the past? If we just had isolated, static pages (jpegs, pdfs, mp3s, etc.) then there’d be no problem.
  • 23. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL personalization Javascript (modifying the page) embedded resources (possibly including other HTML pages via iframes) links Real HTML pages are complex
  • 24. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Javascript is why we can’t have nice (archival) things
  • 25. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Load the archived page, get an eagle https://www.webharvest.gov/congress112th/20130119060624/http://www.fws.gov/
  • 26. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Hit “reload”, get a tiger https://www.webharvest.gov/congress112th/20130119060624/http://www.fws.gov/
  • 27. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Hit “reload” again, get a mountain https://www.webharvest.gov/congress112th/20130119060624/http://www.fws.gov/
  • 28. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL “I've done questionable things.”
  • 29. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Actually, the fws.gov example was super easy; most changes are much harder to trace Mohamed Aturban, unpublished, memento: http://web.archive.org/web/20130724144801/http://www.cnn.com/ Animated GIF: https://blog.dshr.org/2017/11/keynote-at-pacific-neighborhood.html
  • 30. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Embedded resources + Javascript = Our simulation of what CNN.com looked like then is flawed. It will never be 2013 again, so in some sense that page is lost.
  • 31. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Zombies: live web “leaking” into an archived page http://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html this page is from 2008 this ad is from 2012 (when this screen shot was taken) As of late 2017, zombies mostly no longer occur https://blog.dshr.org/2017/09/attacking-users-of-wayback-machine.html
  • 32. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Temporal violations: reconstructing legitimately archived resources into a page that never existed http://ws-dl.blogspot.com/2015/12/2015-12-08-evaluating-temporal.html text (2004-12) says rain, image (2005-09) is clear
  • 33. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Incorrectly replaying the 2004 weather forecast for Varina, Iowa is hardly the stuff of dystopian cyberpunk. But there are cases where temporal violations begin to look like tampering…
  • 34. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Remember the case of Joy Reid’s blog? https://www.odu.edu/news/2018/5/michael_nelson https://twitter.com/DrDanetteAllen/status/990228054952865793
  • 35. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL https://twitter.com/phonedude_mln/status/990054945457147904 HTML archived on 2006-01-11 JS archived on 2006-02-07 Reid was a prolific blogger, so a gap of nearly a month is catastrophic for temporal integrity.
  • 36. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Not always Javascript – cookies causes the web archive to store the Urdu language page at the URL for the English page https://ws-dl.blogspot.com/2018/03/2018-03-21-cookies-are-why-your.html
  • 37. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL https://ws-dl.blogspot.com/2019/03/2019-03-18-cookie-violations-cause.html Cookies + Javascript = A combo Urdu / Portuguese / English page that never existed
  • 38. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Crawling & ingest errors could be exploited to amplify an existing disinformation narrative https://web.archive.org/web/20180302184025/https:/twitter.com/Emma4Change “only a ‘crisis actor’ would tweet in Slovak!” Now imagine she gets fed up, deletes her account, and then someone applies the “abandoned acct / archive” attack Justin Littman described: https://gwu-libraries.github.io/sfm-ui/posts/2017-11-06-vulnerabilities
  • 39. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Web archives are unreliable narrators. Unreliable narrators cause us to question everything we’ve been told.
  • 40. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Let’s prove Lester Holt did not “fudge the tape”! https://twitter.com/AaronBlake/status/1035124642456002565https://twitter.com/realDonaldTrump/status/1035120511259500544 https://news.vice.com/en_us/article/ne5x3d/trump-lester-holt-james-comey-nbc
  • 41. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL The May, 2017 NBC interview is not archived until August, 2018 (and even then, the video itself is not archived) https://www.nbcnews.com/nightly-news/video/pres-trump-s-extended-exclusive-interview-with-lester-holt-at-the-white-house-941854787582?v=raila https://web.archive.org/web/*/https://www.nbcnews.com/nightly-news/video/pres-trump-s-extended-exclusive-interview-with-lester-holt-at-the-white-house-941854787582?v=raila https://web.archive.org/web/20180825094239/https://www.nbcnews.com/nightly-news/video/pres-trump-s-extended-exclusive-interview-with-lester-holt-at-the-white-house-941854787582?v=raila Clicking through to the video reveals a loop of postal carrier slipping on ice; not the Lester Holt interview.
  • 42. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Errors in crawling and playback are hard to distinguish from tampering https://twitter.com/katestarbird/status/911257133231910913 https://er.educause.edu/articles/2018/10/managing-the-cultural-record-in-the-information-warfare-era I want to explicitly note here the difference between the act of quietly rewriting the record and enjoying the results of the rewrites that are accepted as truth and that of deliberately destroying the confidence of the public (including the scholarly community) by creating compromise, confusion, and ambiguity to suggest that the record cannot be trusted.
  • 43. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Disinformation applied to web archives doesn’t necessarily mean you have to insert a specific narrative into the archive. You just need to cast doubt on the archive as our collective memory.
  • 44. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL note: “…both the live Web and the Wayback Machine [...] are reasonably reliable for everyday use” https://blog.dshr.org/2020/03/guest-post-michael-nelsons-response.html https://ws-dl.blogspot.com/2020/03/2020-03-07-at-nexus-of-cni-keynote-and.html
  • 45. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL We’re unaware of any cases where web archive content has been hacked or faked for any substantive goal. However, web archives are not immune. It’s just the theater of conflict has yet to expand to include web archives.
  • 46. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Twitter then and now http://inventorspot.com/articles/top_ten_twitterati_tweet_above_rest_31806 https://www.vox.com/policy-and-politics/2017/10/19/16504510/ten-gop-twitter-russia
  • 47. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Facebook then and now https://twitter.com/Pinboard/status/975013825010458624 https://web.archive.org/web/20090722095954/http://facebook.com/zuck See also: https://www.businessinsider.com/facebook-old-posts-mark-zuckerberg-disappeared-2019-3
  • 48. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Gmail then and now http://googlepress.blogspot.com/2004/04/google-gets-message-launches-gmail.html https://www.avanan.com/resources/gmail-exploit-allows-dnc-email-attack
  • 49. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Web archives then and soon https://web.archive.org/web/20020601134105/http://www.businessweek.com/technology/content/feb2002/tc20020228_1080.htm ?
  • 50. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Why do we expect things to be different for web archives? Our trust model for web archives is still rooted in the 1980s / early 90s.
  • 51. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL My chronology with Unix Late 80s: 1 computer, many users Used an X terminal to access Cray, Convex supercomputers 90s: 1 computer, 1 user My Sun IPX workstation was the first www.larc.nasa.gov now: many computers, 1 user I’m not even sure how many computers I have access to
  • 52. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL From brewster@wais.com Sun Apr 25 00:03:19 1993 Received: from express.larc.nasa.gov by blearg.larc.nasa.gov with SMTP (5.65.2/server2.4) id AA28277; Sun, 25 Apr 93 00:00:26 -0400 Received: from wais.wais.com by express.larc.nasa.gov with SMTP id BA21157 (SMTP/Lite-1.15) for <m.l.nelson@larc.nasa.gov>; Sun, 25 Apr 93 00:00:20 -0400 Received: by wais.wais.com (4.1/SMI- 4.1/Brent-911016) id AA14369; Sat, 24 Apr 93 20:47:54 PDT Date: Sat, 24 Apr 93 20:47:54 PDT Message-Id: <9304250347.AA14369@wais.wais.com> From: Brewster Kahle <brewster@wais.com> To: abc@concert.net To: admin@ds.internic.net To: akers@fiddle.oit.unc.edu To: anders@ifi.uio.no To: anders@munin.ub2.lu.se … To: m.l.nelson@LaRC.NASA.GOV … To: root@ds.internic.net To: root@ncgia.ucsb.edu To: root@fiddle.oit.unc.edu To: root@oac.hsc.uth.tmc.edu To: root@samba.acs.unc.edu To: root@spk41.usace.mil To: root@stone.ucs.indiana.edu To: root@sunsite.unc.edu To: root@uniwa.uwa.oz.au To: root@uva.ci.uv.es To: root@nic.funet.fi … WAIS server maintainers, As you probably know through wais-discussion, we are announcing the commercial WAIS server this thursday. There is a big press event and showcase at the WAIS Inc offices. Thank you, everyone, for making it possible for us to pull off a startup company. We are considering running a special price for a limited time for those that know and understand WAIS already. We would like to discuss this with those that might be interested in it, and would like to help us determine how it should work. Most people will continue to use the freeware, and that is fine, this is for those that might be interested in a commercial version. At this time, we will not be discussing the differences between things or other products. Given that the press has started to call and ask for information before hand (to scoop this story, you know the press...), we have had to keep a very quiet profile. On the other hand, we need the help from all of you. Generally, this is done with a signed non-disclosure basis, but this wont work on the Internet and not in time. What I was thinking was to ask anyone that would like to discuss this, to send an "email non-disclosure" to non-disclosed- waisites-request@wais.com. I wish this weren't so baroque, but you could not believe some of the members of the press I have talked to. If one reporter publishes early, it can spoil things (and get it wrong). (please dont email to me. At this point, my cup floweth over. I will dig out after the showcase!) -brewster TMC->WAIS Inc->AOL->Alexa->IA https://twitter.com/phonedude_mln/status/1105160308866338816
  • 53. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL From brewster@wais.com Sun Apr 25 00:03:19 1993 Received: from express.larc.nasa.gov by blearg.larc.nasa.gov with SMTP (5.65.2/server2.4) id AA28277; Sun, 25 Apr 93 00:00:26 -0400 Received: from wais.wais.com by express.larc.nasa.gov with SMTP id BA21157 (SMTP/Lite-1.15) for <m.l.nelson@larc.nasa.gov>; Sun, 25 Apr 93 00:00:20 -0400 Received: by wais.wais.com (4.1/SMI- 4.1/Brent-911016) id AA14369; Sat, 24 Apr 93 20:47:54 PDT Date: Sat, 24 Apr 93 20:47:54 PDT Message-Id: <9304250347.AA14369@wais.wais.com> From: Brewster Kahle <brewster@wais.com> To: abc@concert.net To: admin@ds.internic.net To: akers@fiddle.oit.unc.edu To: anders@ifi.uio.no To: anders@munin.ub2.lu.se … To: m.l.nelson@LaRC.NASA.GOV … To: root@ds.internic.net To: root@ncgia.ucsb.edu To: root@fiddle.oit.unc.edu To: root@oac.hsc.uth.tmc.edu To: root@samba.acs.unc.edu To: root@spk41.usace.mil To: root@stone.ucs.indiana.edu To: root@sunsite.unc.edu To: root@uniwa.uwa.oz.au To: root@uva.ci.uv.es To: root@nic.funet.fi … WAIS server maintainers, As you probably know through wais-discussion, we are announcing the commercial WAIS server this thursday. There is a big press event and showcase at the WAIS Inc offices. Thank you, everyone, for making it possible for us to pull off a startup company. We are considering running a special price for a limited time for those that know and understand WAIS already. We would like to discuss this with those that might be interested in it, and would like to help us determine how it should work. Most people will continue to use the freeware, and that is fine, this is for those that might be interested in a commercial version. At this time, we will not be discussing the differences between things or other products. Given that the press has started to call and ask for information before hand (to scoop this story, you know the press...), we have had to keep a very quiet profile. On the other hand, we need the help from all of you. Generally, this is done with a signed non-disclosure basis, but this wont work on the Internet and not in time. What I was thinking was to ask anyone that would like to discuss this, to send an "email non-disclosure" to non-disclosed- waisites-request@wais.com. I wish this weren't so baroque, but you could not believe some of the members of the press I have talked to. If one reporter publishes early, it can spoil things (and get it wrong). (please dont email to me. At this point, my cup floweth over. I will dig out after the showcase!) -brewster When computers were $$$, an email to “root” could be expected to be received by someone entrusted with the necessary $$$ to responsibly administer the machine. IOW, “root” was almost always a white hat. It hasn’t been like that for a long time. Web archives are like the Unix mainframes of today. TMC->WAIS Inc->AOL->Alexa->IA https://twitter.com/phonedude_mln/status/1105160308866338816
  • 54. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL How well do you know root@archive.org? As in, could you call/email him right now and expect a response? Our entire national digital preservation strategy is predicated on Brewster Kahle “not being evil”™ If he is leading a 25+ year sleeper cell, we’re doomed.
  • 55. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL
  • 56. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL How well do you know these roots? Many more: https://en.wikipedia.org/wiki/List_of_Web_archiving_initiatives
  • 57. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Up until now, we’ve only looked at failures or edge cases in crawling and replay. What about deliberate fakes?
  • 58. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Cut-n-paste / mashup “fakes” for humor Victorian Photo Collage https://www.metmuseum.org/exhibitions/listings/2010/victorian-photocollage “The Flying Saucer” (1956) https://en.wikipedia.org/wiki/The_Flying_Saucer_(song) https://www.youtube.com/watch?v=XCrn6QXvHLg Brian Williams Raps ‘Gin & Juice’ https://www.youtube.com/watch?v=XlGLhYFrv6w
  • 59. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL We’ve always had “fakes”, the most convincing of which require significant skills, knowledge, and physical access https://en.wikipedia.org/wiki/Piltdown_Man https://en.wikipedia.org/wiki/Shroud_of_Turin https://www.npr.org/templates/story/story.php?storyId=94461486
  • 60. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL “deep learning” + “fake” = deepfakes https://motherboard.vice.com/en_us/article/7x799b/selling-ai-generated-fake-porn-is-probably-a-good-way-to-get-sued https://motherboard.vice.com/en_us/article/ev5eba/ai-fake-porn-of-friends-deepfakes
  • 61. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Becoming more mainstream: https://twitter.com/MikaelThalen/status/1090349932266094593 https://deepfakesapp.online/ A “safe for work” example: No longer buried in the dark corners of Reddit:
  • 62. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Not just porn or novelties: just better revoicing? “Deepfake technology has helped us scale campaign efforts like never before,” Neelkant Bakshi, co-incharge of social media and IT for BJP Delhi, tells VICE. “The Haryanvi videos let us convincingly approach the target audience even if the candidate didn’t speak the language of the voter.” https://www.vice.com/en_in/article/jgedjb/the-first-use-of-deepfakes-in-indian-election-by-bjp
  • 63. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL That deepfakes are even possible forces us to recalibrate our information literacy https://www.motherjones.com/politics/2019/03/deepfake-gabon-ali-bongo/ “But when Gabon’s government released the video, it raised more questions than it answered. ... One week after the video’s release, Gabon’s military attempted an ultimately unsuccessful coup— the country’s first since 1964— citing the video’s oddness as proof something was amiss with the president.”
  • 64. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Or perhaps deepfakes are just the next Photoshop, and skepticism vs. gullibility is more about us and not of the media itself If a fake email can cause this much impact, then it is fair to ask are deepfakes and fake provenance via web Archives even necessary? https://twitter.com/i/events/1230892474140430337
  • 65. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL “Detecting” deepfakes will happen. “Preventing” deepfakes won’t happen; they’re here to stay: Mementos, even of a fake past, are core to the human condition. “Did you get your precious photos?” “Implants. Those aren't your memories, they're somebody else's. They're Tyrell's niece's.” http://deepemotions.free.fr/theme_1.html Real photos, fake memories: replicants attach significant value to photos, even when they know the memories are fake.
  • 66. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Next Thanksgiving dinner, liven up the discussion with your extended family 1. Extract just 0:23—0:26 of the Obama/Peele video 2. Embed in an HTML page 3. Use Javascript to rewrite the banner and browser URL – Datetime: 2016-11-09 – URL: www.whitehouse.gov/totally NotFake 4. Claim the deep state deleted the page from the live web https://www.theverge.com/tldr/2018/4/17/17247334/ai-fake-news-video-barack-obama-jordan-peele-buzzfeed https://www.youtube.com/watch?time_continue=43&v=cQ54GDm1eL0#t=0m23s
  • 67. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Archives have vulnerabilities.
  • 68. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Inserting fakes into real archives Here’s an actual page in the IA “proving” Brian Williams released “Gin and Juice” in 1992, a full year before Snoop Dogg. John Berlin, MS Thesis, 2018 https://www.youtube.com/watch?v=k3QTcJZdFfs (actual URI-R & URI-M have also been obscured in the video to hide the technique) The content is clearly fake, but it demonstrates that it’s possible to write Javascript that attacks the archive’s playback capability. It takes an archiving expert to tell the difference.
  • 69. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL We’ve known about these & other attacks since 2017 http://labs.rhizome.org/presentations/security.html#/ https://acmccs.github.io/papers/p1741-lernerAT3.pdf https://blog.dshr.org/2017/06/wac2017-security-issues-for-web-archives.html https://ws-dl.blogspot.com/2018/04/2018-05-01-high-fidelity-ms-thesis-to.html
  • 70. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL There are other ways, presumably still hypothetical, to attack the archives https://twitter.com/internetarchive/status/596768668756774914 https://xkcd.com/538/
  • 71. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL https://www.theguardian.com/uk-news/2018/sep/05/planes-trains-and-fake-names-the-trail-left-by-skripal-suspects https://www.cnn.com/2018/10/22/middleeast/saudi-operative-jamal-khashoggi-clothes/index.html “Planes, trains and fake names: the trail left by Skripal suspects” “Surveillance footage shows Saudi 'body double' in Khashoggi's clothes after he was killed, Turkish source says” Before you say “that will never happen!” Reminder: agents, dissidents, journalists have all disappeared; they won’t mind adding a librarian/sysadmin to the list
  • 72. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL I’ve got good news and bad news: Setting up a web archive is not as difficult nor expensive as it used to be. OpenWayback, WAIL, pywb, et al. + cloud storage = you can have a web archive running in about the same time it took to generate the Steve Buscemi / Jennifer Lawrence deepfake. https://github.com/iipc/openwayback https://github.com/N0taN3rd/wail https://machawk1.github.io/wail/ https://github.com/webrecorder/pywb
  • 73. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Inserting fakes into fake archives breitbart.com/wayback/*/whitehouse.gov/totallyNotFake infowars.com/web/*/whitehouse.gov/totallyNotFake iluv.aynrand.org/*/whitehouse.gov/totallyNotFake InternetResearchAgency.ru/whitehouse.gov/totallyNotFake How well do you know root at these archives? Are they really four different archives, or one root for all of them? What if 99.9% of the time they faithfully replay pages?
  • 74. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL http://www.dlib.org/dlib/november05/rosenthal/11rosenthal.html What if we start off with > (n/2)+1 archives compromised?
  • 75. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL What if the archives were targeted to amplify a specific disinformation narrative? And what if the archives had no choice but to cooperate?
  • 76. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL The University of Farmington is fake DHS strong armed a “.edu” registration, they could do the same to IA & others too https://twitter.com/nwarikoo/status/1090726638034276352 https://web.archive.org/web/20161023170733/https://universityoffarmington.edu/ https://twitter.com/phonedude_mln/status/1092464939040755712 First capture: 2016-10-23
  • 77. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Blockchain to the rescue!!! <lasers> <sirens> <disco-thumping-soundtrack> nope. https://www.multichain.com/blog/2015/11/avoiding-pointless-blockchain-project/ https://eprint.iacr.org/2017/375.pdf https://blog.dshr.org/search/label/bitcoin
  • 78. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL There is no shortage of deepfake vs. blockchain stories https://www.wired.com/story/the-blockchain-solution-to-our-deepfake-problems/ https://www.longhash.com/news/the-coming-war-between-deepfakes-and-blockchain
  • 79. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL A Voight-Kampff Test for deepfakes now seems around the corner https://twitter.com/TechCrunch/status/1009556795965296642 https://www.technologyreview.com/s/611726/the-defense-department-has-produced-the-first-tools-for-catching-deepfakes/
  • 80. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Are we prepared for the unintended consequences? “Enforcing digital signatures for all cameras and video devices would offer the same capability in reverse. Suddenly every photograph and video shared online could be traced back to its original owner. Security services in a repressive regime could scour social media for all videos depicting them in a negative light and trace them back to the precise individuals who captured the video, arresting them en masse.” https://www.forbes.com/sites/kalevleetaru/2018/09/09/why-digital-signatures-wont-prevent-deep-fakes-but-will-help-repressive-governments/
  • 81. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL On the other hand, “blockchaining” our pets is a study in incompatibility, so tracking photos may never happen https://www.aspca.org/about-us/aspca-policy-and-position-statements/microchips https://moviepaws.com/2017/10/22/owls-snakes-and-unicorns-the-animals-of-blade-runner/ In Blade Runner, synthetic pets have serial numbers (real pets are unavailable to all but the richest). “While most of the world has accepted these standards, North America has not. The primary problem is a competitive, technological one involving the compatibility of the microchips and the readers that are used by shelters and veterinary clinics.”
  • 82. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL As for blockchains and web archives…
  • 83. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL This is not what you think it is… https://petertodd.org/2017/carbon-dating-the-internet-archive-with-opentimestamps
  • 84. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL This is not what you think it is… https://petertodd.org/2017/carbon-dating-the-internet-archive-with-opentimestamps “…right now you can get timestamps for every book, movie, song, computer program, legal document, etc. in the thousands of collections in the archive. In the future we hope to be able to work with the Internet Archive to extend this to timestamping website snapshots…”
  • 85. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL That’s never going to happen. (at least not 3rd party through the playback interface)
  • 86. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Archives Aren’t Magic Web Sites They’re Just Web Sites. https://ws-dl.blogspot.com/2019/08/2019-08-30-where-did-archive-go-part1.html https://ws-dl.blogspot.com/2019/09/2019-09-10-where-did-archive-go-part-2.html https://ws-dl.blogspot.com/2019/09/2019-09-25-where-did-archive-go-part-3.html https://ws-dl.blogspot.com/2019/10/2019-10-21-where-did-archive-go-part-4.html
  • 87. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Archive URI-Ms ----------------------------- perma-archives.org 182 bibalex.org 199 webarchive.org.uk 349 bac-lac.gc.ca 351 proni.gov.uk 469 digar.ee 488 webharvest.gov 712 internetmemory.org 979 nationalarchives.gov.uk 994 stanford.edu 1222 archive-it.org 1383 archive.is 1396 web.archive.org 1566 arquivo.pt 1569 webcitation.org 1585 vefsafn.is 1589 loc.gov 1594 ----------------------------- Total 16627 Sample 16k+ Mementos from 17 Web Archives https://arxiv.org/abs/1905.03836
  • 88. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Periodically Replay Each Archived Page Above example: http://perma-archives.org/warc/20170101182813/http://umich.edu/ 35 times, from Nov. 2017 – Oct. 2018 For each replay, we download both the rewritten version and the “raw” version (where possible).
  • 89. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Periodically Replay Each Archived Page Above example: http://perma-archives.org/warc/20170101182813/http://umich.edu/ 35 times, from Nov. 2017 – Oct. 2018 For each replay, we download both the rewritten version and the “raw” version (where possible). Partial archive outage because of security / maintenance upgrade
  • 90. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Periodically Replay Each Archived Page Above example: http://perma-archives.org/warc/20170101182813/http://umich.edu/ 35 times, from Nov. 2017 – Oct. 2018 For each replay, we download both the rewritten version and the “raw” version (where possible). Post-upgrade, replay is variable.
  • 91. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL More Archived Pages Changed Every Time Than Never Changed (yes, this experiment used “raw” mode) Never changed: 2007 URI-Ms (1 in 8) Always changed: 2773 URI-Ms (1 in 6) Fixity-based approaches, including blockchain, will not work.
  • 92. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL “Hash the screen shot, not the HTML!” That doesn’t work either.
  • 93. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL 1 WARC file, 2 Wayback Machines, 3 Browsers = 6 different replays http://wayback.archive-it.org/all/20130106140348/http://www.harvard.edu/ http://web.archive.org/web/20130106140348/http://www.harvard.edu/ see also: https://ws-dl.blogspot.com/2016/12/2016-12-20-archiving-pages-with.html
  • 94. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Why not create a LOCKSS for web archives? “The tortoise lays on its back, its belly baking in the hot sun, beating its legs trying to turn itself over, but it can’t, not without your help. But you’re not helping. Why is that?”
  • 95. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Web archives are not especially interoperable. There are many issues regarding interoperability, but generational loss is a good demonstration of incompatible assumptions about simulating the past.
  • 96. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL https://web.archive.org/web/20180501125952/https:/twitter.com/phonedude_mln/status/990054945457147904
  • 97. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL http://archive.is/PaKx6
  • 98. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL https://perma.cc/3HMS-TB59
  • 99. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL http://www.webcitation.org/77RhNeyoZ
  • 100. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL https://web.archive.org/web/20190407024654/https://perma.cc/3HMS-TB59
  • 101. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL https://web.archive.org/web/20190407031659/http://www.webcitation.org/77RhNeyoZ
  • 102. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Web archiving interoperability: a metaphor (non-synthetic pets, possibly microchipped) https://www.youtube.com/watch?v=SQudKvrwDAU
  • 103. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL To summarize: Existing, trusted archives can be compromised by: 1) crawling malicious pages, or 2) attacking facilities / personnel 3) court orders Lowered resource threshold for archives allows: 1) “long game” archives: faithful now, corrupt later, 2) “sock puppet” archives: surreptitiously cooperating archives The nature of web archives is to change content – current fixity based approaches will not help.
  • 104. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Looking forward: We need new models for web archiving and verifying authenticity. The Heritrix / Wayback Machine technology stack, while successful, has limited our thinking.
  • 105. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL “Studies generally suggest that, year after year, less than 60 percent of web traffic is human; … For a period of time in 2013, the Times reported this year, a full half of YouTube traffic was “bots masquerading as people,” a portion so high that employees feared an inflection point after which YouTube’s systems for detecting fraudulent traffic would begin to regard bot traffic as real and human traffic as fake. They called this hypothetical event “the Inversion.”” http://nymag.com/intelligencer/2018/12/how-much-of-the-internet-is-fake.html In the IA: robots outnumber humans 10:1 in sessions, 5:4 in HTTP connections, ca. 2012 http://arxiv.org/abs/1309.4016 https://giphy.com/gifs/harrison-ford-blade-runner-sean-young-yjB2fwqjv5rry/media
  • 106. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL I suspect the core of the new model will have a lot in common with click farms https://twitter.com/mbrennanchina/status/1072114511212109824
  • 107. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Record what we saw at crawl time as a baseline, then we need a distance measure for crawl time and replay time http://dx.doi.org/10.5210/fm.v22i112.8097 https://ws-dl.blogspot.com/2013/05/2013-05-25-game-walkthroughs-as.html Documenting instead of archiving… 1)Robotic witnesses 2)New Nielsen families
  • 108. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL Some of you might be thinking “but I don’t like Blade Runner – what can I take away from this talk?” (my wife refers to the film as “serious white guys talking”) Two methods for passing the Voight-Kampff Test for Blade Runner fandom
  • 109. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL 1) Is Deckard a replicant? In the book, he’s definitely human. In the seven (!) versions of the movie, it ranges from “ambiguous” to “replicant”. https://moviepaws.com/2017/10/22/owls-snakes-and-unicorns-the-animals-of-blade-runner/ https://en.wikipedia.org/wiki/Themes_in_Blade_Runner https://en.wikipedia.org/wiki/Blade_Runner#Versions (Hello, FRBR)
  • 110. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL 2) “Tears in Rain” – Greatest monologue in sci-fi? Or greatest monologue of all time? I've seen things you people wouldn't believe. Attack ships on fire off the shoulder of Orion. I watched C-beams glitter in the dark near the Tannhäuser Gate. All those moments will be lost in time, like tears in rain. Time to die. https://www.youtube.com/watch?v=9hDo80ddn4Q https://en.wikipedia.org/wiki/Tears_in_rain_monologue https://www.youtube.com/watch?v=BM54jXndyvQ
  • 111. Drexel CCI IS Department Distinguished Speaker Series, 2020-03-09 @phonedude_mln, @WebSciDL 2) “Tears in Rain” – Greatest monologue in sci-fi? Or greatest monologue of all time? I've crawled things you people wouldn't believe. Clickjacking attacks off the x-frame-options: sameorigin. I watched ajax requests redirect at the aggregator TimeGate. All those pages will be lost in time, like tears in rain. Time to lie. https://www.youtube.com/watch?v=9hDo80ddn4Q https://en.wikipedia.org/wiki/Tears_in_rain_monologue https://www.youtube.com/watch?v=BM54jXndyvQ