Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Weaponized Web Archives: Provenance Laundering of Short Order Evidence

416 views

Published on

Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Michael L. Nelson

Old Dominion University
Web Science & Digital Libraries Research Group
@WebSciDL, @phonedude_mln

With:
ODU: Michele C. Weigle, Mohamed Aturban, John Berlin, Sawood Alam, Plinio Vargas
Los Alamos National Laboratory: Herbert Van de Sompel, Martin Klein

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Weaponized Web Archives: Provenance Laundering of Short Order Evidence

  1. 1. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Weaponized Web Archives: Provenance Laundering of Short Order Evidence Michael L. Nelson Old Dominion University Web Science & Digital Libraries Research Group @WebSciDL, @phonedude_mln With: ODU: Michele C. Weigle, Mohamed Aturban, John Berlin, Sawood Alam, Plinio Vargas Los Alamos National Laboratory: Herbert Van de Sompel, Martin Klein Supported in part by The Andrew Mellon Foundation. Opinions expressed are those of the presenter. based on a 2018-03-23 presentation at the National Forum on Ethics and Archiving the Web
  2. 2. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL TL;DR • We are on the cusp of a “Photoshop” moment for synthesizing convincing audio/video • Web archives will be weaponized to: – alter trustworthy content – obfuscate provenance of untrustworthy content web archives https://imgur.com/gallery/akeVeiq
  3. 3. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL background: what’s a web archive?
  4. 4. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
  5. 5. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDLhttp://web.archive.org/web/*/http://www.odu.edu/ also: http://whatdiditlooklike.mementoweb.org/tagged/odu.edu
  6. 6. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
  7. 7. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL what was here? we’ll likely never know… (ok, xkcd gives us an idea…)
  8. 8. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Sure, go ahead and archive www.odu.edu -- but what about archiving all your Facebook posts, tweets, instagrams, check-ins, etc.?
  9. 9. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL “Why are they putting all that online?” “And it’s easy to deride this sort of thing as self-absorbed publishing – why would anyone put such drivel out in public? It’s simple. They’re not talking to you. We misread these seemingly inane posts because we’re so unused to seeing written material in public that isn’t intended for us.” Clay Shirky, 2008, p. 85
  10. 10. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL We have semi-private discussions in public spaces all the time… https://www.nytimes.com/2017/09/19/us/politics/isnt-that-the-trump-lawyer-a-reporters-accidental-scoop.html https://well.blogs.nytimes.com/2013/06/21/how-the-hum-of-a-coffee-shop-can-boost-creativity/ Even though we know others can eavesdrop – maybe we even want that – if they whipped out their iPhone and started recording us, it might change our behavior. See also: gevulot, agoras, and exomemory in “The Quantum Thief” https://en.wikipedia.org/wiki/The_Quantum_Thief
  11. 11. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Given enough time, our private/public performances become art https://archive.org/details/prelingerhomemovies https://genius.com/Dj-shadow-letter-from-home-lyrics https://www.youtube.com/watch?v=MIR62rreRKY https://www.youtube.com/watch?v=fKjg1HfZfPM#t=2m46s https://www.sinecurebooks.com/shop/enjoy-the-experience-bundle/ personally identifiable information!
  12. 12. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL as the web archiving community, we constantly are asking ourselves: “Are we creating tools that aid the surveillance state?” Spoiler alert: Yes.
  13. 13. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Our attitude about the surveillance state is contextual. https://www.cbsnews.com/pictures/boston-marathon-bombing-iconic-images/ http://www.boston.com/metrodesk/2013/04/20/boston-police-commissioner-edward-davis-says- releasing-photos-was-turning-point-boston-marathon-bomb-probe/sojcZNcTCGah8UYBnRuk9O/story.html Boston Marathon Bombing, 2013 https://twitter.com/charliespiering/status/976430395964215296
  14. 14. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL The surveillance state also surveils the (other) state. https://www.theguardian.com/uk-news/2018/sep/05/planes-trains-and-fake-names-the-trail-left-by-skripal-suspects “Planes, trains and fake names: the trail left by Skripal suspects” https://www.cnn.com/2018/10/22/middleeast/saudi-operative-jamal-khashoggi-clothes/index.html “Surveillance footage shows Saudi 'body double' in Khashoggi's clothes after he was killed, Turkish source says”
  15. 15. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Meanwhile, we happily pay monthly service fees to be surveilled! https://www.citiusminds.com/blog/home-automation-with-smart-speakers-amazon-echo-vs-google-home-vs-apple-homepod/ https://twitter.com/mtdukes/status/974281625348558848
  16. 16. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL “Quis custodiet ipsos custodes?” A: Social media. https://twitter.com/WiredUK/status/958084308924760065 https://twitter.com/vicenews/status/670059493581959168
  17. 17. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL We don’t feel too bad when we archive accounts that later prove to be trolls / sockpuppets / sybils https://twitter.com/safety_refinery/status/934982022078042112 https://ws-dl.blogspot.com/2018/01/2018-01-02-link-to-web-archives-not.html https://twitter.com/documentnow/status/964882665982722048 https://news.docnow.io/blacktivists-in-the-archive-71c807aa247e https://github.com/fivethirtyeight/russian-troll-tweets/ https://blog.twitter.com/official/en_us/topics/company/2018/enabling-further-research-of-information-operations-on-twitter.html
  18. 18. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Nor do we feel bad for holding public figures / organizations accountable https://twitter.com/landlibrarian/status/975910915135754240 https://twitter.com/IEEEhistory/status/960358528987942912 http://archive.is/xh58B
  19. 19. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL But our attitude is different when those organizations explicitly monitor us https://twitter.com/pierce/status/980860438119301120https://twitter.com/LSJNews/status/979017806116245504
  20. 20. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL We can & should discuss our role in surveillance, but realize Facebook et al. are operating as designed (and before you say “I’m too cool for Facebook”, remember that Facebook owns Instagram) https://twitter.com/zeynep/status/975076957485457408https://twitter.com/Pinboard/status/975013825010458624 see also: data as toxic waste http://idlewords.com/talks/haunted_by_data.htm
  21. 21. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL as the web archiving community, we should be asking ourselves: “Can we authenticate web content?” Spoiler alert: Yes. A bit.
  22. 22. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Stewart Brand, The Media Lab: Inventing the Future at MIT, 1987, p. 201
  23. 23. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL “We cannot accept this photograph in evidence” http://www.politifact.com/florida/statements/2018/mar/27/ blog-posting/david-hogg-not-school-during-shooting-s-fake-news/ https://twitter.com/acnwala/status/977982456296034304
  24. 24. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Granted, we’ve had obvious, cut-n-paste / mashup “evidence” for a long time… Victorian Photo Collage https://www.metmuseum.org/exhibitions/listings/2010/victorian-photocollage “The Flying Saucer” (1956) https://en.wikipedia.org/wiki/The_Flying_Saucer_(song) https://www.youtube.com/watch?v=XCrn6QXvHLg Brian Williams Raps ‘Gin & Juice’ https://www.youtube.com/watch?v=XlGLhYFrv6w
  25. 25. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Crude techniques = humor, sophisticated techniques = deception; Brand’s prediction of “any day now” is now Synthesizing Obama: Learning Lip Sync from Audio SIGGRAPH 2017 https://grail.cs.washington.edu/projects/AudioToObama/ Face2Face: Real-time Face Capture and Reenactment of RGB Videos, CVPR 2016 http://niessnerlab.org/projects/thies2016face.html see also: https://www.youtube.com/watch?v=pkkph4JhrCg
  26. 26. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Clumsy, “collage/flying saucer/gin & juice” techniques are already effective on social media We are completely unprepared for advanced, SIGGRAPH/CVPR techniques
  27. 27. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL “Surely web archives can be used to establish priority and authenticity?” Let’s look at some examples. cf. https://gizmodo.com/how-archivists-could-stop-deepfakes-from-rewriting-hist-1829666009
  28. 28. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Neo-Nazis and “Black Panther” Relationship Status: It’s Complicated http://knowyourmeme.com/photos/1338390-black-panther https://twitter.com/TamikaDMallory/status/964701120194019328 See also: https://www.snopes.com/fact-check/mexican-police-caravan-photos/
  29. 29. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL nydailynews.com provides screenshots, but not links to the tweets… http://www.nydailynews.com/entertainment/movies/trolls-lying-assaults-black-panther-showings-article-1.3824901
  30. 30. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL @AsianWifeHaver and @DSA_Boi_Pucci are not on the live web… $ curl -I https://twitter.com/AsianWifeHaver HTTP/1.1 302 Found cache-control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0 content-length: 103 content-type: text/html;charset=utf-8 date: Sat, 17 Mar 2018 22:09:27 GMT expires: Tue, 31 Mar 1981 05:00:00 GMT last-modified: Sat, 17 Mar 2018 22:09:27 GMT location: https://twitter.com/account/suspended $ curl -I https://twitter.com/AsianWifeHaver HTTP/1.1 302 Found cache-control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0 content-length: 103 content-type: text/html;charset=utf-8 date: Sat, 17 Mar 2018 22:09:27 GMT expires: Tue, 31 Mar 1981 05:00:00 GMT last-modified: Sat, 17 Mar 2018 22:09:27 GMT location: https://twitter.com/account/suspended $ curl -I https://twitter.com/DSA_Boi_Pucci HTTP/1.1 404 Not Found cache-control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0 content-length: 6329 content-security-policy: [deletia] content-type: text/html;charset=utf-8 date: Sat, 17 Mar 2018 22:14:22 GMT expires: Tue, 31 Mar 1981 05:00:00 GMT last-modified: Sat, 17 Mar 2018 22:14:22 GMT
  31. 31. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL …nor are they in the Internet Archive note: this exists only because of the redirection to the “suspended” page http://web.archive.org/web/*/https://twitter.com/AsianWifeHaver http://web.archive.org/web/*/https://twitter.com/DSA_Boi_Pucci
  32. 32. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
  33. 33. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Can’t find @DSA_Boi_Pucci in any archive Typical archive URI construction: archive.example.org/SomeString/CNN.com/travel web.archive.org/web/*/twitter.com/DSA_Boi_Pucci wayback.archive-it.org/all/*/twitter.com/DSA_Boi_Pucci perma-archives.org/warc/twitter.com/DSA_Boi_Pucci archive.is/twitter.com/DSA_Boi_Pucci www.webarchive.org.uk/wayback/archive/twitter.com/DSA_Boi_Pucci wayback.vefsafn.is/wayback/twitter.com/DSA_Boi_Pucci arquivo.pt/wayback/twitter.com/DSA_Boi_Pucci for a full list of public web archives, see: http://labs.mementoweb.org/aggregator_config/archivelist.xml
  34. 34. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL At this point, the absence of evidence means I cannot prove that @DSA_Boi_Pucci: 1) ever existed or 2) was not created by nydailynews.com
  35. 35. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL What if we checked these archives? breitbart.com/wayback/*/twitter.com/DSA_Boi_Pucci infowars.com/web/*/twitter.com/DSA_Boi_Pucci iluv.aynrand.org/*/twitter.com/DSA_Boi_Pucci InternetResearchAgency.ru/twitter.com/DSA_Boi_Pucci What if they all said “nydailynews.com test account”? Would you trust the results?
  36. 36. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Our entire national web preservation strategy is predicated on Brewster Kahle (IA) “not being evil”™ If he is leading a 20+ year sleeper cell, we’re doomed.
  37. 37. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Web archives with international implications: Malaysia Airlines Flight 17 (MH17) http://web.archive.org/web/20140717152222/http://vk.com/strelkov_info http://www.csmonitor.com/World/Europe/2014/0717/Web-evidence-points-to-pro-Russia-rebels-in-downing-of-MH17-video http://www.newyorker.com/magazine/2015/01/26/cobweb
  38. 38. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL
  39. 39. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL (not really archived as well as we’d like)
  40. 40. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Ed and I Discuss Who Has What… https://twitter.com/phonedude_mln/status/490171976389238784
  41. 41. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Remember MH17? https://twitter.com/phonedude_mln/status/490171976389238784
  42. 42. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Alex is now 404. Would multiple archives have convinced him? https://twitter.com/quicknquiet
  43. 43. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Do we really have “a perfect tool to produce `evidence’ of any kind”?
  44. 44. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Segal’s Law, restated for web archives: The person with an archive knows what the page looked like. The person with two archives is never sure.
  45. 45. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL (apologies to Notorious B.I.G.) “Mo Archives, Mo Problems” Why? Because they’ll rarely agree. Even a single archive is an unreliable witness: zombies, temporal violations, and attacks
  46. 46. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Zombies: live web “leaking” into an archived page http://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html this page is from 2008 this ad is from 2012 (when this screen shot was taken)
  47. 47. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Temporal violations: reconstructing legitimately archived resources into a page that never existed http://ws-dl.blogspot.com/2015/12/2015-12-08-evaluating-temporal.html https://ws-dl.blogspot.com/2018/04/2018-04-24-why-we-need-multiple-web.html text (2004-12) says rain, image (2005-09) is clear
  48. 48. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Directly attacking the archive (in this case, via orphaned live web resources; “zombie attack”) Lerner, Kohno, Roesner, 2017 https://doi.org/10.1145/3133956.3134042 see also: Cushman & Kreymer http://labs.rhizome.org/presentations/security.html page is from 2011, iframe content is from 2017 (when screenshot was taken)
  49. 49. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Based on feedback from Lerner et al., IA has changed their playback (specifically, with a Content-Security-Policy HTTP response header) But playback remains problematic… (apologies to Peter Arnett) “In order to save the page, we had to completely change it” let’s look at four common scenarios
  50. 50. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL 1) JavaScript does not run correctly from the archive http://ws-dl.blogspot.com/2017/01/2017-01-20-cnncom-has-been-unarchivable.html https://ws-dl.blogspot.com/2018/04/2018-05-01-high-fidelity-ms-thesis-to.html This is cnn.com not replaying; it hasn’t replayed correctly since November 1, 2016
  51. 51. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL 2) Archived page doesn’t match live web experience https://web.archive.org/web/20180302184025/https:/twitter.com/Emma4Change http://ws-dl.blogspot.com/2018/03/2018-03-21-cookies-are-why-your.html “only a ‘crisis actor’ would tweet in Slovak!” Now imagine she gets fed up, deletes her account, and then someone applies the “abandoned acct / archive” attack Justin Littman described: https://gwu-libraries.github.io/sfm-ui/posts/2017-11-06-vulnerabilities
  52. 52. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL (apologies to Heraclitus) 3) You cannot replay twice the same archived page Mohamed Aturban, unpublished, memento: http://web.archive.org/web/20130724144801/http://www.cnn.com/ Animated GIF: https://blog.dshr.org/2017/11/keynote-at-pacific-neighborhood.html
  53. 53. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL 4) Archives are not magic web sites; they have the same problems as regular web sites Archive URIMs With at least two hashes ---------------------------------------------------------------- webharvest.gov 712 712 (100%) archive.is 1396 1364 (97.70%) vefsafn.is 1589 739 (46.50%) archive-it.org 1383 815 (58.92%) stanford.edu 1222 831 (68.00%) internetmemory.org 979 979 (100%) nationalarchives.gov.uk 994 972 (97.78%) archive.bibalex.org 199 177 (88.94%) bac-lac.gc.ca 351 351 (100%) proni.gov.uk 469 129 (27.50%) www.webarchive.org.uk 349 329 (94.26%) www.webcitation.org 1585 828 (52.23%) veebiarhiiv.digar.ee 488 308 (63.11%) webarchive.loc.gov 1594 526 (32.99%) arquivo.pt 1569 1563 (99.61%) web.archive.org 1566 1334 (85.18%) perma-archives.org 182 180 (98.90%) ---------------------------------------------------------------- 16627 12137 (72.99%) Data from 35 downloads over an 11 month period (2017-11 – 2018-10), Mohamed Aturban (in preparation)
  54. 54. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL How can we differentiate between “normal” archive playback modification vs. deception? If the tweets or accts are deleted, we don’t know. If I embed fake tweets in another page, then archive that page, only an expert can tell the fake tweets don’t come from twitter.com (& fake archives will lie!) And it is not in Twitter’s (perceived) self-interest to help, cf.: https://techcrunch.com/2018/01/03/why-twitter-wont-remove-trumps-nuclear-war-tweet/ https://www.vox.com/2018/10/29/18037880/twitter-may-remove-like-button https://www.bloomberg.com/news/articles/2018-10-27/twitter-apologizes-for-ignoring-apparent-threat-in-tweet These might have been swapped -- but how can you tell for sure?
  55. 55. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Inserting fakes into real archives Here’s an actual page in the IA “proving” Brian Williams released “Gin and Juice” in 1992, a full year before Snoop Dogg. John Berlin, MS Thesis, 2018 https://www.youtube.com/watch?v=k3QTcJZdFfs (actual URI-R & URI-M have also been obscured in the video to hide the technique) The content is clearly fake, but imagine replacing: 1)“1992” with a more believable “2016”, 2)the fake domain with “bbc.com”, and 3)Brian Williams rapping with a synthesized Trump or Obama speech.
  56. 56. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL “That will never happen! …right?”
  57. 57. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL This isn’t just hypothetical… The opening salvo in what is and isn’t a “deepfake”: https://twitter.com/AaronBlake/status/1035124642456002565https://twitter.com/realDonaldTrump/status/1035120511259500544 https://news.vice.com/en_us/article/ne5x3d/trump-lester-holt-james-comey-nbc
  58. 58. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL The May, 2017 NBC interview is not archived until August, 2018 (and even then, the video itself is not archived) https://www.nbcnews.com/nightly-news/video/pres-trump-s-extended-exclusive-interview-with-lester-holt-at-the-white-house-941854787582?v=raila https://web.archive.org/web/*/https://www.nbcnews.com/nightly-news/video/pres-trump-s-extended-exclusive-interview-with-lester-holt-at-the-white-house-941854787582?v=raila https://web.archive.org/web/20180825094239/https://www.nbcnews.com/nightly-news/video/pres-trump-s-extended-exclusive-interview-with-lester-holt-at-the-white-house-941854787582?v=raila Clicking through to the video reveals a loop of postal carrier slipping on ice; not the Lester Holt interview.
  59. 59. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Now convince “Alex” that: 1) the live web nbc.com video has not been modified 2) the IA archive failure is not suspicious / convenient 3) the archived “copy” in infowars.com is not authentic
  60. 60. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Blockchain to the rescue!!! <lasers> <sirens> <disco-thumping-soundtrack> nope. https://www.multichain.com/blog/2015/11/avoiding-pointless-blockchain-project/ https://eprint.iacr.org/2017/375.pdf
  61. 61. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Instead, let’s use web archives to monitor web archives.
  62. 62. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Step 1: Push to multiple archives web.archive.org/web/20180321/eaw.rhizome.org wayback.archive-it.org/all/20180321/eaw.rhizome.org arquivo.pt/wayback/20180321/eaw.rhizome.org archive.is/20180321/eaw.rhizome.orgeaw.rhizome.org
  63. 63. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Step 2: Compute fixity, publish fixity “manifest” at a well-known location manifest.org/20180322/web.archive.org/web/20180321/eaw.rhizome.org manifest.org/20180322/wayback.archive-it.org/all/20180321/eaw.rhizome.org manifest.org/20180322/arquivo.pt/wayback/20180321/eaw.rhizome.org manifest.org/20180322/archive.is/20180321/eaw.rhizome.org It’s understood that archived HTML is continuously rewritten, so only compute fixity on things that should not change, like JPEGs and certain original HTTP response headers. This example assumes the existence of a well-known server manifest.org. Actual URIs can be a bit more complex using “Trusty URIs”: http://ws-dl.blogspot.com/2017/01/2017-01-15-summary-of-trusty-uris.html
  64. 64. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Step 3: Wondering about veracity of an archived page? Check manifest.org and recompute fixity. manifest.org/20180322/web.archive.org/ web/20180321/eaw.rhizome.orgweb.archive.org/web/20180321/eaw.rhizome.org what if manifest.org is down? or possibly hacked? We can’t know archive.org did not alter contents on ingest (20180321), but we can verify that it has not changed since our observation (20180322)
  65. 65. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Step 4: Push manifest to multiple archives web.archive.org/web/20180323/manifest.org/20180322/web.archi ve.org/web/20180321/eaw.rhizome.org wayback.archive-it.org/all/20180323/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org arquivo.pt/wayback/20180323/eaw.rhizome.org/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org archive.is/20180323/eaw.rhizome.org/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome .org Now the 20180322 version of the manifest of archive.org’s memento of rhizome.org is in four different archives. The URIs are ugly, but the bottom line is an attacker would have to hack a majority of 5 domains (manifest.org + 4 archives) Can repeat for manifests of mementos of rhizome.org in archive-it.org, arquivo.pt, archive.is, etc.
  66. 66. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Wondering about veracity of an archived page? Check all copies of manifest.org and take a majority vote manifest.org/20180322/web.archive.org/ web/20180321/eaw.rhizome.org web.archive.org/web/20180321/eaw.rhizome.org Caveat 1: If I can hack the rhizome.org page at archive.org, I can probably hack the fixity info there too, so we really have 4 copies not 5. web.archive.org/web/20180323/manifest.org/20180322/web.arch ive.org/web/20180321/eaw.rhizome.org wayback.archive-it.org/all/20180323/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org arquivo.pt/wayback/20180323/eaw.rhizome.org/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org archive.is/20180323/eaw.rhizome.org/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org Caveat 2: archive.org and archive-it.org are not independent, so we really have 3 copies not 5. (yes, this is very similar to textual criticism)
  67. 67. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL No fixity information? Maybe it’s ok, maybe it’s not. infowars.com/web/20180321/eaw.rhizome.org 404 404 404 404 404 or perhaps fixity was computed and stored at infowars.com; you have to decide if you trust that site. see also: https://www.youtube.com/watch?v=EY15lj-7_lc http://ws-dl.blogspot.com/2017/12/2017-12-11-difficulties-in-timestamping.html
  68. 68. Va Tech CS Seminar, 2018-11-02, @phonedude_mln, @WebSciDL Conclusions • See Melanie Ehrenkranz’s article for good news: • https://gizmodo.com/how-archivists-could-stop-deepfakes-from-rewriting-hist-1829666009 • I, however, bring mostly bad news: – The web will be the primary vector for increasingly sophisticated disinformation – Web archives can be used to forge or obscure the provenance of this information – Vagaries of archive playback means – naïve fixity approaches will not work – an archive is not always a reliable witness – archives are vulnerable to attack from the pages they crawl – “Fake” archives are easy to set up & proliferate – Brian Williams (1992) is the OG, not Snoop Dogg (1993)

×