Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes

1,020 views

Published on

Lezing door Srijan Kumar bij VOGIN-IP-lezing, Amsterdam, 9 maart 2017

Published in: Internet
  • Be the first to comment

  • Be the first to like this

Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes

  1. 1. Disinformation on the Web: Impact, Characteristics and Detection of Wikipedia Hoaxes Srijan Kumar Univ. of Maryland Robert West Stanford Univ. Jure Leskovec Stanford Univ. 1 Originally presented at the 25th International World Wide Web Conference, Montreal, Canada, April 2016
  2. 2. Web: Source of information 2 62% adults in U.S.A. rely on social media for news 28% of 18- 24 year olds use social media as primary news source
  3. 3. Web: Source of false information 3
  4. 4. Types of false information 4 Misinformation honest mistake Disinformation deliberate lie to mislead Hoax “deliberately fabricated falsehood made to masquerade as truth” Wikipedia
  5. 5. Why Wikipedia? The free encyclopedia that anyone can edit 5 Easy to add (false) information • Freely accessible • Large reach • Major source of information for many
  6. 6. Hoaxes on Wikipedia 6
  7. 7. Data: Wikipedia Hoaxes Hoax article vs hoax facts 7
  8. 8. Data: Wikipedia Hoaxes Hoax article vs hoax facts 21,218 hoax articles 8 Hoax lifecycle:
  9. 9. Wikipedia hoaxes 9 Impact of hoaxes Characteristics of hoaxes Detection of hoaxes Quantify their impact? What are the hoaxes like? Can we find them?
  10. 10. Impact of hoaxes “The worst hoaxes are those which (a) last for a long time, (b) receive significant traffic, (c) are relied upon by credible news media.” Jimmy Wales on Quora 10
  11. 11. Impact of hoaxes “The worst hoaxes are those which (a) last for a long time” 11 Time t between patrolling and flagging 0.990.90
  12. 12. Impact of hoaxes “The worst hoaxes are those which (b) receive significant traffic” 12 10 100 500 Number n of pageviews per day
  13. 13. Impact of hoaxes “The worst hoaxes are those which (c) are relied upon by credible news media” 13 1.08 active inlinks per hoax article, on average 7% of hoax articles have at least 5 active inlinks
  14. 14. Wikipedia hoaxes 14 Impact of hoaxes Characteristics of hoaxes Detection of hoaxes Most hoaxes are caught soon, but some hoaxes are impactful What are the hoaxes like? Can we find them?
  15. 15. 15 Successful hoax pass patrol survive for a month viewed 100+/day Failed hoax flagged and deleted during patrol Wrongly flagged temporarily flagged Legitimate articles never flagged Hoax Non-hoax
  16. 16. Characteristics of hoaxes 16 Appearance: how the article looks Link-network: how the article connects Support: how other articles refer to it Editor: how the article creator looks
  17. 17. Characteristics of hoaxes 17 Surprisingly, hoax articles are longer than non-hoax articles! Features: o Plain-text length Appearance: how the article looks Link-network: how the article connects Support: how other articles refer to it Editor: how the article creator looks
  18. 18. Characteristics of hoaxes 18 Surprisingly, hoax articles are longer than non-hoax articles! but they mostly have plain text and have fewer web and wiki links. Appearance: how the article looks Link-network: how the article connects Support: how other articles refer to it Editor: how the article creator looks Features: o Plain-text length o Plain-text-to-markup ratio o Wiki-link density o Web-link density
  19. 19. Characteristics of hoaxes 19 Clustering coefficient = 0 incoherent article Clustering coefficient > 0 coherent article Legitimate articles are more coherent than successful hoaxes Appearance: hoaxes mostly have text and few references. Link-network: how the article connects Support: how other articles refer to it Editor: how the article creator looks
  20. 20. Characteristics of hoaxes 20 Hoax mentions are less in number. Features: o Number of prior mentions Appearance: hoaxes mostly have text and few references. Link-network: hoaxes have incoherent wikilinks. Support: how other articles refer to it Editor: how the article creator looks
  21. 21. Characteristics of hoaxes 21 Hoax mentions are less in number, mostly created by article creator or anonymously, and are more recently created. Features: o Number of prior mentions o Creator of first mention o Time since first mention Appearance: hoaxes mostly have text and few references. Link-network: hoaxes have incoherent wikilinks. Support: how other articles refer to it Editor: how the article creator looks
  22. 22. Characteristics of hoaxes 22 Hoax creators are more recently registered, and have lesser editing experience. Features: o Creator’s time since registration o Creator’s experience Appearance: hoaxes mostly have text and few references. Link-network: hoaxes have incoherent wikilinks. Support: hoaxes have few, recent, suspicious mentions. Editor: how the article creator looks
  23. 23. Wikipedia Hoaxes 23 Impact of hoaxes Characteristics of hoaxes Detection of hoaxes Hoaxes are different from non- hoaxes in many respects Most hoaxes are caught soon, but some hoaxes are impactful Can we find them?
  24. 24. Detection of hoaxes 24 Will a hoax get past patrol? Is an article a hoax? Is an article flagged as hoax really one? AUC = 71% Appearance features AUC = 98% Editor and Network features AUC = 86% Editor and support features
  25. 25. We discovered previously unknown hoaxes! 25 Flagged by us and deleted by Wikipedia administrators Steve Moertel American popcorn entrepreneur Article survived over 6 years 11 months!
  26. 26. Can readers identify hoaxes? 26 Results 320 random hoax and non-hoax pairs 10 raters on Amazon Mechanical Turk rated each pair Casual readers are gullible to hoaxes. Accurate detection needs non-appearance features. 50% Random 66% Human 86% Classifier
  27. 27. What fools humans? 27 Humans get fooled when article looks more “genuine”, and it is assumed to be credible. Comparing easy- vs hard-to-identify hoaxes
  28. 28. How to identify misinformation on the web? 28 ● Appearance ○ How well referenced is the information source? ○ What is the content of the article? ● Editor ○ Who created the information? ● Network ○ How related is this information to other information it references to? ● Support ○ Is there any evidence of the information, prior to its creation?
  29. 29. Wikipedia Hoaxes 29 Impact of hoaxes Characteristics of hoaxes Detection of hoaxes Hoaxes are different from non- hoaxes in many respects Most hoaxes are caught soon, but some hoaxes are impactful Non-appearance features are important to detect hoaxes
  30. 30. Thank you! srijan@cs.umd.edu http://cs.umd.edu/~srijan

×