Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Weaponized Web Archives: Provenance Laundering of Short Order Evidence

1,482 views

Published on

Michael L. Nelson

Old Dominion University
Web Science & Digital Libraries Research Group
@WebSciDL, @phonedude_mln

With:
ODU: Michele C. Weigle, Mohamed Aturban, John Berlin, Sawood Alam, Plinio Vargas
Los Alamos National Laboratory: Herbert Van de Sompel, Martin Klein

National Forum on Ethics and Archiving the Web
2018-03-23, #eaw18, @phonedude_mln

Published in: Technology
  • Be the first to comment

Weaponized Web Archives: Provenance Laundering of Short Order Evidence

  1. 1. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Weaponized Web Archives: Provenance Laundering of Short Order Evidence Michael L. Nelson Old Dominion University Web Science & Digital Libraries Research Group @WebSciDL, @phonedude_mln With: ODU: Michele C. Weigle, Mohamed Aturban, John Berlin, Sawood Alam, Plinio Vargas Los Alamos National Laboratory: Herbert Van de Sompel, Martin Klein Supported in part by The Andrew Mellon Foundation. Opinions expressed are those of the presenter.
  2. 2. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln TL;DR • We are on the cusp of a “Photoshop” moment for synthesizing convincing audio/video • Web archives will be weaponized to: – alter trustworthy content – obfuscate provenance of untrustworthy content web archives https://imgur.com/gallery/akeVeiq
  3. 3. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln as a community, we constantly are asking ourselves: “Are we creating tools that aid the surveillance state?” Spoiler alert: Yes.
  4. 4. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Our attitude about the surveillance state is contextual. https://www.cbsnews.com/pictures/boston-marathon-bombing-iconic-images/ http://www.boston.com/metrodesk/2013/04/20/boston-police-commissioner-edward-davis-says- releasing-photos-was-turning-point-boston-marathon-bomb-probe/sojcZNcTCGah8UYBnRuk9O/story.html Boston Marathon Bombing, 2013 https://twitter.com/charliespiering/status/976430395964215296
  5. 5. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Given enough time, it becomes art https://archive.org/details/prelingerhomemovies https://genius.com/Dj-shadow-letter-from-home-lyrics https://www.youtube.com/watch?v=MIR62rreRKY personally identifiable information!
  6. 6. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Meanwhile, we happily pay monthly service fees to be surveilled! https://www.citiusminds.com/blog/home-automation-with-smart-speakers-amazon-echo-vs-google-home-vs-apple-homepod/ https://twitter.com/mtdukes/status/974281625348558848
  7. 7. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln “Quis custodiet ipsos custodes?” A: Social media. https://twitter.com/WIRED/status/958350367468683267 https://twitter.com/vicenews/status/670059493581959168
  8. 8. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln We don’t feel too bad when we archive accounts that later prove to be trolls / sockpuppets / sybils https://twitter.com/safety_refinery/status/934982022078042112 https://ws-dl.blogspot.com/2018/01/2018-01-02-link-to-web-archives-not.html https://twitter.com/documentnow/status/964882665982722048 https://news.docnow.io/blacktivists-in-the-archive-71c807aa247e
  9. 9. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Nor do we feel bad for holding public figures / organizations accountable https://twitter.com/landlibrarian/status/975910915135754240 https://twitter.com/IEEEhistory/status/960358528987942912 http://archive.is/xh58B
  10. 10. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln We can & should discuss our role in surveillance, but realize Facebook is operating as designed (and before you say “I’m too cool for Facebook”, remember that Facebook owns Instagram) https://twitter.com/zeynep/status/975076957485457408https://twitter.com/Pinboard/status/975013825010458624 see also: data as toxic waste http://idlewords.com/talks/haunted_by_data.htm
  11. 11. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln as a community, we should be asking ourselves: “Can we authenticate web content?” Spoiler alert: Yes. A bit.
  12. 12. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Stewart Brand, The Media Lab: Inventing the Future at MIT, 1987, p. 201
  13. 13. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Granted, we’ve had obvious, cut-n-paste / mashup “evidence” for a long time… Victorian Photo Collage https://www.metmuseum.org/exhibitions/listings/2010/victorian-photocollage “The Flying Saucer” (1956) https://en.wikipedia.org/wiki/The_Flying_Saucer_(song) https://www.youtube.com/watch?v=XCrn6QXvHLg Brian Williams Raps ‘Gin & Juice’ https://www.youtube.com/watch?v=XlGLhYFrv6w
  14. 14. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Crude techniques = humor, sophisticated techniques = deception; Brand’s prediction of “any day now” is now Synthesizing Obama: Learning Lip Sync from Audio SIGGRAPH 2017 https://grail.cs.washington.edu/projects/AudioToObama/ Face2Face: Real-time Face Capture and Reenactment of RGB Videos, CVPR 2016 http://niessnerlab.org/projects/thies2016face.html see also: https://www.youtube.com/watch?v=pkkph4JhrCg
  15. 15. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln What does this have to do with the web? Clumsy, “collage/flying saucer/gin & juice” techniques are already effective on social media We are completely unprepared for advanced, SIGGRAPH/CVPR techniques
  16. 16. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Neo-Nazis and “Black Panther” Relationship Status: It’s Complicated http://knowyourmeme.com/photos/1338390-black-panther https://twitter.com/TamikaDMallory/status/964701120194019328
  17. 17. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln nydailynews.com provides screenshots, but not links to the tweets… http://www.nydailynews.com/entertainment/movies/trolls-lying-assaults-black-panther-showings-article-1.3824901
  18. 18. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln @AsianWifeHaver and @DSA_Boi_Pucci are not on the live web… $ curl -I https://twitter.com/AsianWifeHaver HTTP/1.1 302 Found cache-control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0 content-length: 103 content-type: text/html;charset=utf-8 date: Sat, 17 Mar 2018 22:09:27 GMT expires: Tue, 31 Mar 1981 05:00:00 GMT last-modified: Sat, 17 Mar 2018 22:09:27 GMT location: https://twitter.com/account/suspended $ curl -I https://twitter.com/AsianWifeHaver HTTP/1.1 302 Found cache-control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0 content-length: 103 content-type: text/html;charset=utf-8 date: Sat, 17 Mar 2018 22:09:27 GMT expires: Tue, 31 Mar 1981 05:00:00 GMT last-modified: Sat, 17 Mar 2018 22:09:27 GMT location: https://twitter.com/account/suspended $ curl -I https://twitter.com/DSA_Boi_Pucci HTTP/1.1 404 Not Found cache-control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0 content-length: 6329 content-security-policy: [deletia] content-type: text/html;charset=utf-8 date: Sat, 17 Mar 2018 22:14:22 GMT expires: Tue, 31 Mar 1981 05:00:00 GMT last-modified: Sat, 17 Mar 2018 22:14:22 GMT
  19. 19. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln …nor are they in the Internet Archive note: this exists only because of the redirection to the “suspended” page http://web.archive.org/web/*/https://twitter.com/AsianWifeHaver http://web.archive.org/web/*/https://twitter.com/DSA_Boi_Pucci
  20. 20. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln
  21. 21. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Can’t find @DSA_Boi_Pucci in any archive Typical archive URI construction: archive.example.org/SomeString/CNN.com/travel web.archive.org/web/*/twitter.com/DSA_Boi_Pucci wayback.archive-it.org/all/*/twitter.com/DSA_Boi_Pucci perma-archives.org/warc/twitter.com/DSA_Boi_Pucci archive.is/twitter.com/DSA_Boi_Pucci www.webarchive.org.uk/wayback/archive/twitter.com/DSA_Boi_Pucci wayback.vefsafn.is/wayback/twitter.com/DSA_Boi_Pucci arquivo.pt/wayback/twitter.com/DSA_Boi_Pucci for a full list of public web archives, see: http://labs.mementoweb.org/aggregator_config/archivelist.xml
  22. 22. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln What if we checked these archives? What if they all agreed? breitbart.com/wayback/*/twitter.com/DSA_Boi_Pucci infowars.com/web/*/twitter.com/DSA_Boi_Pucci iluv.aynrand.org/*/twitter.com/DSA_Boi_Pucci InternetResearchAgency.ru/twitter.com/DSA_Boi_Pucci Would you trust the results?
  23. 23. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Our entire national digital preservation strategy is predicated on Brewster Kahle “not being evil”™ If he is leading a 20+ year sleeper cell, we’re doomed.
  24. 24. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Segal’s Law, restated for web archives: The person with an archive knows what the page looked like. The person with two archives is never sure.
  25. 25. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln However, even with a single web archive, there can be problems: zombies, temporal violations, and attacks
  26. 26. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Zombies: live web “leaking” into an archived page http://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html this page is from 2008 this ad is from 2012 (when this screen shot was taken)
  27. 27. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Temporal violations: reconstructing legitimately archived resources into a page that never existed http://ws-dl.blogspot.com/2015/12/2015-12-08-evaluating-temporal.html text (2004-12) says rain, image (2005-09) is clear
  28. 28. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Directly attacking the archive (in this case, via orphaned live web resources; “zombie attack”) Lerner, Kohno, Roesner, 2017 https://doi.org/10.1145/3133956.3134042 see also: Cushman & Kreymer http://labs.rhizome.org/presentations/security.html page is from 2011, iframe content is from 2017 (when screenshot was taken)
  29. 29. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Based on feedback from Lerner et al., IA has changed their playback (specifically, with a Content-Security-Policy HTTP response header) But playback remains problematic… (apologies to Peter Arnett) “In order to save the page, we had to completely change it” let’s look at four common scenarios
  30. 30. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln 1) JavaScript does not run correctly from the archive http://ws-dl.blogspot.com/2017/01/2017-01-20-cnncom-has-been-unarchivable.html This is cnn.com not replaying; it hasn’t replayed correctly since November 1, 2016
  31. 31. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln 2) Archived page renders differently each time Mohamed Aturban, unpublished, memento: http://web.archive.org/web/20130724144801/http://www.cnn.com/
  32. 32. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln 3) Archive modifies pages that should stay the same – goodbye conventional fixity checks! Mohamed Aturban, unpublished, embedding memento: http://perma-archives.org/warc/20170101182813/http://umich.edu/ http://perma-archives.org/warc/20170101182814id_/http://umich.edu/includes/image/type/gallery/id/113/name/ResearchDIL-19Aug14_DM%28136%29.jpg/width/152/height/152/mode/minfit/
  33. 33. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln 4) Archived page doesn’t match live web experience https://web.archive.org/web/20180302184025/https:/twitter.com/Emma4Change http://ws-dl.blogspot.com/2018/03/2018-03-21-cookies-are-why-your.html “only a ‘crisis actor’ would tweet in Slovak!” Now imagine she gets fed up, deletes her account, and then someone applies the “abandoned acct / archive” attack Justin Littman described: https://gwu-libraries.github.io/sfm-ui/posts/2017-11-06-vulnerabilities
  34. 34. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln How can we differentiate between “normal” modification for playback vs. deception? These might have been swapped -- but how can you tell for sure? If the tweets or accts are deleted, we don’t know. If I embed fake tweets in another page, it’s even more confusing. And it is not in Twitter’s (perceived) self-interest to help, cf.: https://techcrunch.com/2018/01/03/why-twitter-wont-remove-trumps-nuclear-war-tweet/
  35. 35. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln You cannot trust the URL in your browser! Here’s an actual page in the IA “proving” Brian Williams released “Gin and Juice” in 1992, a full year before Snoop Dogg. John Berlin, MS Thesis, 2018 https://www.youtube.com/watch?v=k3QTcJZdFfs (actual URI-R & URI-M have also been faked in video) The content is clearly fake, but imagine replacing: 1)“1992” with a more believable “2016”, 2)the fake domain with “bbc.com”, and 3)Brian Williams rapping with a synthesized Trump or Obama speech.
  36. 36. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Blockchain to the rescue!!! <lasers> <sirens> <disco-thumping-soundtrack> nope. https://www.multichain.com/blog/2015/11/avoiding-pointless-blockchain-project/ https://eprint.iacr.org/2017/375.pdf
  37. 37. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Instead, let’s use web archives to monitor web archives.
  38. 38. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Step 1: Push to multiple archives web.archive.org/web/20180321/eaw.rhizome.org wayback.archive-it.org/all/20180321/eaw.rhizome.org arquivo.pt/wayback/20180321/eaw.rhizome.org archive.is/20180321/eaw.rhizome.orgeaw.rhizome.org
  39. 39. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Step 2: Compute fixity, publish fixity “manifest” at a well-known location manifest.org/20180322/web.archive.org/web/20180321/eaw.rhizome.org manifest.org/20180322/wayback.archive-it.org/all/20180321/eaw.rhizome.org manifest.org/20180322/arquivo.pt/wayback/20180321/eaw.rhizome.org manifest.org/20180322/archive.is/20180321/eaw.rhizome.org It’s understood that archived HTML is continuously rewritten, so only compute fixity on things that should not change, like JPEGs and certain original HTTP response headers. This example assumes the existence of a well-known server manifest.org. Actual URIs can be a bit more complex using “Trusty URIs”: http://ws-dl.blogspot.com/2017/01/2017-01-15-summary-of-trusty-uris.html
  40. 40. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Wondering about veracity of an archived page? Check manfiest.org and recompute fixity. manifest.org/20180322/web.archive.org/ web/20180321/eaw.rhizome.orgweb.archive.org/web/20180321/eaw.rhizome.org what if manifest.org is down? or possibly hacked? We can’t know archive.org did not alter contents on ingest (20180321), but we can verify that it has not changed since our observation (20180322)
  41. 41. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Step 4: Push manifest to multiple archives web.archive.org/web/20180323/manifest.org/20180322/web.archi ve.org/web/20180321/eaw.rhizome.org wayback.archive-it.org/all/20180323/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org arquivo.pt/wayback/20180323/eaw.rhizome.org/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org archive.is/20180323/eaw.rhizome.org/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome .org Now the 20180322 version of the manifest of archive.org’s memento of rhizome.org is in four different archives. The URIs are ugly, but the bottom line is an attacker would have to hack a majority of 5 domains (manifest.org + 4 archives) Can repeat for manifests of mementos of rhizome.org in archive-it.org, arquivo.pt, archive.is, etc.
  42. 42. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Wondering about veracity of an archived page? Check all copies of manfiest.org and take a majority vote manifest.org/20180322/web.archive.org/ web/20180321/eaw.rhizome.org web.archive.org/web/20180321/eaw.rhizome.org Caveat 1: If I can hack rhizome.org page at archive.org, I can probably hack the fixity info there too, so we really have 4 copies not 5. web.archive.org/web/20180323/manifest.org/20180322/web.arch ive.org/web/20180321/eaw.rhizome.org wayback.archive-it.org/all/20180323/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org arquivo.pt/wayback/20180323/eaw.rhizome.org/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org archive.is/20180323/eaw.rhizome.org/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org Caveat 2: archive.org and archive-it.org are not independent, so we really have 3 copies not 5.
  43. 43. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln No fixity information? Maybe it’s ok, maybe it’s not. infowars.com/web/20180321/eaw.rhizome.org 404 404 404 404 404 or perhaps fixity was computed and stored at freedomfries.org; you have to decide if you trust that site. see also: https://www.youtube.com/watch?v=EY15lj-7_lc http://ws-dl.blogspot.com/2017/12/2017-12-11-difficulties-in-timestamping.html
  44. 44. National Forum on Ethics and Archiving the Web 2018-03-23, #eaw18, @phonedude_mln Conclusions • Bad news: – The web will be the primary vector for increasingly sophisticated disinformation – Web archives can be used to forge or obscure the provenance of this information – Brian Williams predates Snoop Dogg • Good news: – Web archives have a role in authenticating who said what, and when – We should have a web archiving presence at: June 7-8, 2018, NYC: https://www.fakenewshorrorshow.org/ –

×