Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Weaponized Web Archives: Provenance Laundering of Short Order Evidence

681 views

Published on

Michael L. Nelson

Old Dominion University
Web Science & Digital Libraries Research Group
@WebSciDL, @phonedude_mln

With:
ODU: Michele C. Weigle, Mohamed Aturban, John Berlin, Sawood Alam, Plinio Vargas
Los Alamos National Laboratory: Herbert Van de Sompel, Martin Klein

ODU Computer Science Colloquium 2018-04-06

based on a 2018-03-23 presentation at the National Forum on Ethics and Archiving the Web

Published in: Science
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Weaponized Web Archives: Provenance Laundering of Short Order Evidence

  1. 1. ODU CS Colloquium, 2018-04-06, @phonedude_mln Weaponized Web Archives: Provenance Laundering of Short Order Evidence Michael L. Nelson Old Dominion University Web Science & Digital Libraries Research Group @WebSciDL, @phonedude_mln With: ODU: Michele C. Weigle, Mohamed Aturban, John Berlin, Sawood Alam, Plinio Vargas Los Alamos National Laboratory: Herbert Van de Sompel, Martin Klein Supported in part by The Andrew Mellon Foundation. Opinions expressed are those of the presenter. based on a 2018-03-23 presentation at the National Forum on Ethics and Archiving the Web
  2. 2. ODU CS Colloquium, 2018-04-06, @phonedude_mln TL;DR • We are on the cusp of a “Photoshop” moment for synthesizing convincing audio/video • Web archives will be weaponized to: – alter trustworthy content – obfuscate provenance of untrustworthy content web archives https://imgur.com/gallery/akeVeiq
  3. 3. ODU CS Colloquium, 2018-04-06, @phonedude_mln background: what’s a web archive?
  4. 4. ODU CS Colloquium, 2018-04-06, @phonedude_mln
  5. 5. ODU CS Colloquium, 2018-04-06, @phonedude_mlnhttp://web.archive.org/web/*/http://www.odu.edu/ also: http://whatdiditlooklike.mementoweb.org/tagged/odu.edu
  6. 6. ODU CS Colloquium, 2018-04-06, @phonedude_mln
  7. 7. ODU CS Colloquium, 2018-04-06, @phonedude_mln what was here? we’ll likely never know… (ok, xkcd gives us an idea…)
  8. 8. ODU CS Colloquium, 2018-04-06, @phonedude_mln Sure, go ahead and archive www.odu.edu -- but what about archiving all your Facebook posts, tweets,instagrams, check-ins, etc.?
  9. 9. ODU CS Colloquium, 2018-04-06, @phonedude_mln “Why are they putting all that online?” “And it’s easy to deride this sort of thing as self-absorbed publishing – why would anyone put such drivel out in public? It’s simple. They’re not talking to you. We misread these seemingly inane posts because we’re so unused to seeing written material in public that isn’t intended for us.” Clay Shirky, 2008, p. 85
  10. 10. ODU CS Colloquium, 2018-04-06, @phonedude_mln We have semi-private discussions in public spaces all the time… https://www.nytimes.com/2017/09/19/us/politics/isnt-that-the-trump-lawyer-a-reporters-accidental-scoop.html https://well.blogs.nytimes.com/2013/06/21/how-the-hum-of-a-coffee-shop-can-boost-creativity/ Even though we know others can eavesdrop – maybe we even want that – if they whipped out their iPhone and started recording us, it might change our behavior.
  11. 11. ODU CS Colloquium, 2018-04-06, @phonedude_mln as the web archiving community, we constantly are asking ourselves: “Are we creating tools that aid the surveillance state?” Spoiler alert: Yes.
  12. 12. ODU CS Colloquium, 2018-04-06, @phonedude_mln Our attitude about the surveillance state is contextual. https://www.cbsnews.com/pictures/boston-marathon-bombing-iconic-images/ http://www.boston.com/metrodesk/2013/04/20/boston-police-commissioner-edward-davis-says- releasing-photos-was-turning-point-boston-marathon-bomb-probe/sojcZNcTCGah8UYBnRuk9O/story.html Boston Marathon Bombing, 2013 https://twitter.com/charliespiering/status/976430395964215296
  13. 13. ODU CS Colloquium, 2018-04-06, @phonedude_mln Given enough time, it becomes art https://archive.org/details/prelingerhomemovies https://genius.com/Dj-shadow-letter-from-home-lyrics https://www.youtube.com/watch?v=MIR62rreRKY https://www.youtube.com/watch?v=fKjg1HfZfPM#t=2m46s https://www.sinecurebooks.com/shop/enjoy-the-experience-bundle/ personally identifiable information!
  14. 14. ODU CS Colloquium, 2018-04-06, @phonedude_mln Meanwhile, we happily pay monthly service fees to be surveilled! https://www.citiusminds.com/blog/home-automation-with-smart-speakers-amazon-echo-vs-google-home-vs-apple-homepod/ https://twitter.com/mtdukes/status/974281625348558848
  15. 15. ODU CS Colloquium, 2018-04-06, @phonedude_mln “Quis custodiet ipsos custodes?” A: Social media. https://twitter.com/WIRED/status/958350367468683267 https://twitter.com/vicenews/status/670059493581959168
  16. 16. ODU CS Colloquium, 2018-04-06, @phonedude_mln We don’t feel too bad when we archive accounts that later prove to be trolls / sockpuppets / sybils https://twitter.com/safety_refinery/status/934982022078042112 https://ws-dl.blogspot.com/2018/01/2018-01-02-link-to-web-archives-not.html https://twitter.com/documentnow/status/964882665982722048 https://news.docnow.io/blacktivists-in-the-archive-71c807aa247e
  17. 17. ODU CS Colloquium, 2018-04-06, @phonedude_mln Nor do we feel bad for holding public figures / organizations accountable https://twitter.com/landlibrarian/status/975910915135754240 https://twitter.com/IEEEhistory/status/960358528987942912 http://archive.is/xh58B
  18. 18. ODU CS Colloquium, 2018-04-06, @phonedude_mln But our attitude is different when those organizations explicitly monitor us https://twitter.com/pierce/status/980860438119301120https://twitter.com/LSJNews/status/979017806116245504
  19. 19. ODU CS Colloquium, 2018-04-06, @phonedude_mln We can & should discuss our role in surveillance, but realize Facebook et al. operating as designed (and before you say “I’m too cool for Facebook”, remember that Facebook owns Instagram) https://twitter.com/zeynep/status/975076957485457408https://twitter.com/Pinboard/status/975013825010458624 see also: data as toxic waste http://idlewords.com/talks/haunted_by_data.htm
  20. 20. ODU CS Colloquium, 2018-04-06, @phonedude_mln as the web archiving community, we should be asking ourselves: “Can we authenticate web content?” Spoiler alert: Yes. A bit.
  21. 21. ODU CS Colloquium, 2018-04-06, @phonedude_mln Stewart Brand, The Media Lab: Inventing the Future at MIT, 1987, p. 201
  22. 22. ODU CS Colloquium, 2018-04-06, @phonedude_mln “We cannot accept this photograph in evidence” http://www.politifact.com/florida/statements/2018/mar/27/ blog-posting/david-hogg-not-school-during-shooting-s-fake-news/ https://twitter.com/acnwala/status/977982456296034304
  23. 23. ODU CS Colloquium, 2018-04-06, @phonedude_mln Granted, we’ve had obvious, cut-n-paste / mashup “evidence” for a long time… Victorian Photo Collage https://www.metmuseum.org/exhibitions/listings/2010/victorian-photocollage “The Flying Saucer” (1956) https://en.wikipedia.org/wiki/The_Flying_Saucer_(song) https://www.youtube.com/watch?v=XCrn6QXvHLg Brian Williams Raps ‘Gin & Juice’ https://www.youtube.com/watch?v=XlGLhYFrv6w
  24. 24. ODU CS Colloquium, 2018-04-06, @phonedude_mln Crude techniques = humor, sophisticated techniques = deception; Brand’s prediction of “any day now” is now Synthesizing Obama: Learning Lip Sync from Audio SIGGRAPH 2017 https://grail.cs.washington.edu/projects/AudioToObama/ Face2Face: Real-time Face Capture and Reenactment of RGB Videos, CVPR 2016 http://niessnerlab.org/projects/thies2016face.html see also: https://www.youtube.com/watch?v=pkkph4JhrCg
  25. 25. ODU CS Colloquium, 2018-04-06, @phonedude_mln What does this have to do with the web? Clumsy, “collage/flying saucer/gin & juice” techniques are already effective on social media We are completely unprepared for advanced, SIGGRAPH/CVPR techniques
  26. 26. ODU CS Colloquium, 2018-04-06, @phonedude_mln Neo-Nazis and “Black Panther” Relationship Status: It’s Complicated http://knowyourmeme.com/photos/1338390-black-panther https://twitter.com/TamikaDMallory/status/964701120194019328
  27. 27. ODU CS Colloquium, 2018-04-06, @phonedude_mln nydailynews.com provides screenshots, but not links to the tweets… http://www.nydailynews.com/entertainment/movies/trolls-lying-assaults-black-panther-showings-article-1.3824901
  28. 28. ODU CS Colloquium, 2018-04-06, @phonedude_mln @AsianWifeHaver and @DSA_Boi_Pucci are not on the live web… $ curl -I https://twitter.com/AsianWifeHaver HTTP/1.1 302 Found cache-control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0 content-length: 103 content-type: text/html;charset=utf-8 date: Sat, 17 Mar 2018 22:09:27 GMT expires: Tue, 31 Mar 1981 05:00:00 GMT last-modified: Sat, 17 Mar 2018 22:09:27 GMT location: https://twitter.com/account/suspended $ curl -I https://twitter.com/AsianWifeHaver HTTP/1.1 302 Found cache-control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0 content-length: 103 content-type: text/html;charset=utf-8 date: Sat, 17 Mar 2018 22:09:27 GMT expires: Tue, 31 Mar 1981 05:00:00 GMT last-modified: Sat, 17 Mar 2018 22:09:27 GMT location: https://twitter.com/account/suspended $ curl -I https://twitter.com/DSA_Boi_Pucci HTTP/1.1 404 Not Found cache-control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0 content-length: 6329 content-security-policy: [deletia] content-type: text/html;charset=utf-8 date: Sat, 17 Mar 2018 22:14:22 GMT expires: Tue, 31 Mar 1981 05:00:00 GMT last-modified: Sat, 17 Mar 2018 22:14:22 GMT
  29. 29. ODU CS Colloquium, 2018-04-06, @phonedude_mln …nor are they in the Internet Archive note: this exists only because of the redirection to the “suspended” page http://web.archive.org/web/*/https://twitter.com/AsianWifeHaver http://web.archive.org/web/*/https://twitter.com/DSA_Boi_Pucci
  30. 30. ODU CS Colloquium, 2018-04-06, @phonedude_mln
  31. 31. ODU CS Colloquium, 2018-04-06, @phonedude_mln Can’t find @DSA_Boi_Pucci in any archive Typical archive URI construction: archive.example.org/SomeString/CNN.com/travel web.archive.org/web/*/twitter.com/DSA_Boi_Pucci wayback.archive-it.org/all/*/twitter.com/DSA_Boi_Pucci perma-archives.org/warc/twitter.com/DSA_Boi_Pucci archive.is/twitter.com/DSA_Boi_Pucci www.webarchive.org.uk/wayback/archive/twitter.com/DSA_Boi_Pucci wayback.vefsafn.is/wayback/twitter.com/DSA_Boi_Pucci arquivo.pt/wayback/twitter.com/DSA_Boi_Pucci for a full list of public web archives, see: http://labs.mementoweb.org/aggregator_config/archivelist.xml
  32. 32. ODU CS Colloquium, 2018-04-06, @phonedude_mln What if we checked these archives? What if they all agreed? breitbart.com/wayback/*/twitter.com/DSA_Boi_Pucci infowars.com/web/*/twitter.com/DSA_Boi_Pucci iluv.aynrand.org/*/twitter.com/DSA_Boi_Pucci InternetResearchAgency.ru/twitter.com/DSA_Boi_Pucci Would you trust the results?
  33. 33. ODU CS Colloquium, 2018-04-06, @phonedude_mln Our entire national digital preservation strategy is predicated on Brewster Kahle (IA) “not being evil”™ If he is leading a 20+ year sleeper cell, we’re doomed.
  34. 34. ODU CS Colloquium, 2018-04-06, @phonedude_mln Malaysia Airlines Flight 17 (MH17) http://web.archive.org/web/20140717152222/http://vk.com/strelkov_info http://www.csmonitor.com/World/Europe/2014/0717/Web-evidence-points-to-pro-Russia-rebels-in-downing-of-MH17-video http://www.newyorker.com/magazine/2015/01/26/cobweb
  35. 35. ODU CS Colloquium, 2018-04-06, @phonedude_mln
  36. 36. ODU CS Colloquium, 2018-04-06, @phonedude_mln (not really archived as well as we’d like)
  37. 37. ODU CS Colloquium, 2018-04-06, @phonedude_mln Ed and I Discuss Who Has What… https://twitter.com/phonedude_mln/status/490171976389238784
  38. 38. ODU CS Colloquium, 2018-04-06, @phonedude_mln Remember MH17? https://twitter.com/phonedude_mln/status/490171976389238784
  39. 39. ODU CS Colloquium, 2018-04-06, @phonedude_mln Alex is now 404. Would multiple archives have convinced him? https://twitter.com/quicknquiet
  40. 40. ODU CS Colloquium, 2018-04-06, @phonedude_mln Do we really have “a perfect tool to produce `evidence’ of any kind”?
  41. 41. ODU CS Colloquium, 2018-04-06, @phonedude_mln Segal’s Law, restated for web archives: The person with an archive knows what the page looked like. The person with two archives is never sure.
  42. 42. ODU CS Colloquium, 2018-04-06, @phonedude_mln (apologies to Notorious B.I.G.) “Mo Archives, Mo Problems” Why? Because they’ll rarely agree. Even a single archive is an unreliable witness: zombies, temporal violations, and attacks
  43. 43. ODU CS Colloquium, 2018-04-06, @phonedude_mln Zombies: live web “leaking” into an archived page http://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html this page is from 2008 this ad is from 2012 (when this screen shot was taken)
  44. 44. ODU CS Colloquium, 2018-04-06, @phonedude_mln Temporal violations: reconstructing legitimately archived resources into a page that never existed http://ws-dl.blogspot.com/2015/12/2015-12-08-evaluating-temporal.html text (2004-12) says rain, image (2005-09) is clear
  45. 45. ODU CS Colloquium, 2018-04-06, @phonedude_mln Directly attacking the archive (in this case, via orphaned live web resources; “zombie attack”) Lerner, Kohno, Roesner, 2017 https://doi.org/10.1145/3133956.3134042 see also: Cushman & Kreymer http://labs.rhizome.org/presentations/security.html page is from 2011, iframe content is from 2017 (when screenshot was taken)
  46. 46. ODU CS Colloquium, 2018-04-06, @phonedude_mln Based on feedback from Lerner et al., IA has changed their playback (specifically, with a Content-Security-Policy HTTP response header) But playback remains problematic… (apologies to Peter Arnett) “In order to save the page, we had to completely change it” let’s look at four common scenarios
  47. 47. ODU CS Colloquium, 2018-04-06, @phonedude_mln 1) JavaScript does not run correctly from the archive http://ws-dl.blogspot.com/2017/01/2017-01-20-cnncom-has-been-unarchivable.html This is cnn.com not replaying; it hasn’t replayed correctly since November 1, 2016
  48. 48. ODU CS Colloquium, 2018-04-06, @phonedude_mln 2) Archived page renders differently each time Mohamed Aturban, unpublished, memento: http://web.archive.org/web/20130724144801/http://www.cnn.com/
  49. 49. ODU CS Colloquium, 2018-04-06, @phonedude_mln 3) Archive modifies pages that should stay the same – goodbye conventional fixity checks! Mohamed Aturban, unpublished, embedding memento: http://perma-archives.org/warc/20170101182813/http://umich.edu/ http://perma-archives.org/warc/20170101182814id_/http://umich.edu/includes/image/type/gallery/id/113/name/ResearchDIL-19Aug14_DM%28136%29.jpg/width/152/height/152/mode/minfit/
  50. 50. ODU CS Colloquium, 2018-04-06, @phonedude_mln 4) Archived page doesn’t match live web experience https://web.archive.org/web/20180302184025/https:/twitter.com/Emma4Change http://ws-dl.blogspot.com/2018/03/2018-03-21-cookies-are-why-your.html “only a ‘crisis actor’ would tweet in Slovak!” Now imagine she gets fed up, deletes her account, and then someone applies the “abandoned acct / archive” attack Justin Littman described: https://gwu-libraries.github.io/sfm-ui/posts/2017-11-06-vulnerabilities
  51. 51. ODU CS Colloquium, 2018-04-06, @phonedude_mln How can we differentiate between “normal” archive modification for playback vs. deception? These might have been swapped -- but how can you tell for sure? If the tweets or accts are deleted, we don’t know. If I embed fake tweets in another page, it’s even more confusing. And it is not in Twitter’s (perceived) self-interest to help, cf.: https://techcrunch.com/2018/01/03/why-twitter-wont-remove-trumps-nuclear-war-tweet/
  52. 52. ODU CS Colloquium, 2018-04-06, @phonedude_mln You cannot trust the URL in your browser! Here’s an actual page in the IA “proving” Brian Williams released “Gin and Juice” in 1992, a full year before Snoop Dogg. John Berlin, MS Thesis, 2018 https://www.youtube.com/watch?v=k3QTcJZdFfs (actual URI-R & URI-M have also been faked in video) The content is clearly fake, but imagine replacing: 1)“1992” with a more believable “2016”, 2)the fake domain with “bbc.com”, and 3)Brian Williams rapping with a synthesized Trump or Obama speech.
  53. 53. ODU CS Colloquium, 2018-04-06, @phonedude_mln Blockchain to the rescue!!! <lasers> <sirens> <disco-thumping-soundtrack> nope. https://www.multichain.com/blog/2015/11/avoiding-pointless-blockchain-project/ https://eprint.iacr.org/2017/375.pdf
  54. 54. ODU CS Colloquium, 2018-04-06, @phonedude_mln Instead, let’s use web archives to monitor web archives.
  55. 55. ODU CS Colloquium, 2018-04-06, @phonedude_mln Step 1: Push to multiple archives web.archive.org/web/20180321/eaw.rhizome.org wayback.archive-it.org/all/20180321/eaw.rhizome.org arquivo.pt/wayback/20180321/eaw.rhizome.org archive.is/20180321/eaw.rhizome.orgeaw.rhizome.org
  56. 56. ODU CS Colloquium, 2018-04-06, @phonedude_mln Step 2: Compute fixity, publish fixity “manifest” at a well-known location manifest.org/20180322/web.archive.org/web/20180321/eaw.rhizome.org manifest.org/20180322/wayback.archive-it.org/all/20180321/eaw.rhizome.org manifest.org/20180322/arquivo.pt/wayback/20180321/eaw.rhizome.org manifest.org/20180322/archive.is/20180321/eaw.rhizome.org It’s understood that archived HTML is continuously rewritten, so only compute fixity on things that should not change, like JPEGs and certain original HTTP response headers. This example assumes the existence of a well-known server manifest.org. Actual URIs can be a bit more complex using “Trusty URIs”: http://ws-dl.blogspot.com/2017/01/2017-01-15-summary-of-trusty-uris.html
  57. 57. ODU CS Colloquium, 2018-04-06, @phonedude_mln Wondering about veracity of an archived page? Check manifest.org and recompute fixity. manifest.org/20180322/web.archive.org/ web/20180321/eaw.rhizome.orgweb.archive.org/web/20180321/eaw.rhizome.org what if manifest.org is down? or possibly hacked? We can’t know archive.org did not alter contents on ingest (20180321), but we can verify that it has not changed since our observation (20180322)
  58. 58. ODU CS Colloquium, 2018-04-06, @phonedude_mln Step 4: Push manifest to multiple archives web.archive.org/web/20180323/manifest.org/20180322/web.archi ve.org/web/20180321/eaw.rhizome.org wayback.archive-it.org/all/20180323/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org arquivo.pt/wayback/20180323/eaw.rhizome.org/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org archive.is/20180323/eaw.rhizome.org/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome .org Now the 20180322 version of the manifest of archive.org’s memento of rhizome.org is in four different archives. The URIs are ugly, but the bottom line is an attacker would have to hack a majority of 5 domains (manifest.org + 4 archives) Can repeat for manifests of mementos of rhizome.org in archive-it.org, arquivo.pt, archive.is, etc.
  59. 59. ODU CS Colloquium, 2018-04-06, @phonedude_mln Wondering about veracity of an archived page? Check all copies of manifest.org and take a majority vote manifest.org/20180322/web.archive.org/ web/20180321/eaw.rhizome.org web.archive.org/web/20180321/eaw.rhizome.org Caveat 1: If I can hack rhizome.org page at archive.org, I can probably hack the fixity info there too, so we really have 4 copies not 5. web.archive.org/web/20180323/manifest.org/20180322/web.arch ive.org/web/20180321/eaw.rhizome.org wayback.archive-it.org/all/20180323/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org arquivo.pt/wayback/20180323/eaw.rhizome.org/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org archive.is/20180323/eaw.rhizome.org/manifest.org/20180322/ web.archive.org/web/20180321/eaw.rhizome.org Caveat 2: archive.org and archive-it.org are not independent, so we really have 3 copies not 5. (yes, this is very similar to textual criticism)
  60. 60. ODU CS Colloquium, 2018-04-06, @phonedude_mln No fixity information? Maybe it’s ok, maybe it’s not. infowars.com/web/20180321/eaw.rhizome.org 404 404 404 404 404 or perhaps fixity was computed and stored at freedomfries.org; you have to decide if you trust that site. see also: https://www.youtube.com/watch?v=EY15lj-7_lc http://ws-dl.blogspot.com/2017/12/2017-12-11-difficulties-in-timestamping.html
  61. 61. ODU CS Colloquium, 2018-04-06, @phonedude_mln Conclusions • Bad news: – The web will be the primary vector for increasingly sophisticated disinformation – Web archives can be used to forge or obscure the provenance of this information – Brian Williams predates Snoop Dogg • Good news: – Web archives have a role in authenticating who said what, and when – Contact Dr. Weigle and me if you are interested in privacy, authenticity, social media, web archiving, etc.

×