Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ICDM 2017 tutorial misinformation

680 views

Published on

A rapid increase in social networking services in recent years has enabled people to share and seek information effectively. Meanwhile, the openness and timeliness of social networking sites also allow for the rapid creating and dissemination of misinformation. As witnessed in recent incidents of fake news and rumors, misinformation escalates quickly and can impact social media users with undesirable consequences and wreak havoc instantaneously. Despite many people have been aware of that fake news and rumors are misleading the public and even compromising elections, the problem is not going away. In this tutorial, we will discuss how misinformation gains traction in the race for attention, introduce emerging challenges of identifying misinformation, present a comparative survey of current data mining research in tackling the challenges, and suggest available resources and point to directions for future work. http://www.public.asu.edu/~liangwu1/ICDM17MisinformationTutorial.html

Published in: Science
  • Be the first to comment

ICDM 2017 tutorial misinformation

  1. 1. Arizona State University Mining Misinformation in Social Media, November 21, 2017 1 Mining Misinformation in Social Media: Understanding Its Rampant Spread, Harm, and Intervention Liang Wu1, Giovanni Luca Ciampaglia2, Huan Liu1 1Arizona State University 2Indiana University Bloomington
  2. 2. Arizona State University Mining Misinformation in Social Media, November 21, 2017 2 Tutorial Web Page • All materials and resources are available online: http://bit.ly/ICDMTutorial
  3. 3. Arizona State University Mining Misinformation in Social Media, November 21, 2017 3 Introduction
  4. 4. Arizona State University Mining Misinformation in Social Media, November 21, 2017 4 Definition of Misinformation • False and inaccurate information that is spontaneously spread. • Misinformation can be… – Disinformation – Rumors – Urban legend – Spam – Troll – Fake news – … https://www.kdnuggets.com/2016/08/misinformation-key-terms-explained.html
  5. 5. Arizona State University Mining Misinformation in Social Media, November 21, 2017 5 Ecosystem of Misinformation Misinformation Fake News Rumors Spams Click- baits User User User User • Spreaders – Fabrication • Misinformation – Fake news – Rumor – … • Influenced Users – Echo chamber – Filter bubble Motivation Spammer Fraudster …
  6. 6. Arizona State University Mining Misinformation in Social Media, November 21, 2017 6 Misinformation Ramification Top issues highlighted for 2014 • 1. Rising societal tensions in the Middle East and North Africa • 10. The rapid spread of misinformation online • 2. Widening income disparities • … • 3. Persistent structural unemployment 4.07 4.02 3.97 3.35 • Top 10 global risks – World Economic Forum
  7. 7. Arizona State University Mining Misinformation in Social Media, November 21, 2017 7 Word of The Year • Macquarie Dictionary Word of the Year 2016 • Oxford Dictionaries Word of the Year 2016
  8. 8. Arizona State University Mining Misinformation in Social Media, November 21, 2017 8 Social Media • Social media has changed the way of exchanging and obtaining information. • 500 million tweets are posted per day – An effective channel for information dissemination RenRenTwitter & Facebook
  9. 9. Arizona State University Mining Misinformation in Social Media, November 21, 2017 9 Social Media: A Channel for Misinformation • False and inaccurate information is pervasive. • Misinformation can be devastating. – Cause undesirable consequences – Wreak havoc User User User User Echo Chamber: Misinformation can be reinforced Filter Bubble: Misinformation can be targeted
  10. 10. Arizona State University Mining Misinformation in Social Media, November 21, 2017 10 Two Examples • PizzaGate – Fake News has Real Consequences – What made Edgar Maddison Welch “raid” a “pedo ring” on 12/1/2016? – All started with a post on Facebook, spread to Twitter and then went viral with platforms like Breitbart and Info-Wars • Anti-Vaccine Movement on Social Media: A case of echo chambers and filter bubbles – Peer-to-peer connection – Groups – Facebook feeds
  11. 11. Arizona State University Mining Misinformation in Social Media, November 21, 2017 11 PizzaGate https://www.nytimes.com/interactive/2016/12/10/business/media/pizzagate.html •WikiLeaks began releasing emails of Podesta. •2 •Social media users on Reddit searched the releases for evidence of wrongdoing. •3 •Discussions were found that include the word pizza, including dinner plans. •4 •A participant connected the phrase “cheese pizza” to pedophiles (“c.p.” -->child pornography). •5 •Following the use of “pizza,” theorists focused on the Washington pizza restaurant Comet Ping Pong. •6 •The theory started snowballing, taking on the meme #PizzaGate. Fake news articles emerged. •7 •The false stories swept up neighboring businesses and bands that had played at Comet. Theories about kill rooms, underground tunnels, satanism and even cannibalism emerged. •8 •Edgar M. Welch, a 28-year-old from North Carolina, fired the rifle inside the pizzeria, and surrendered after finding no evidence to support claims of child slaves. •9 •The shooting did not put the theory to rest. Purveyors of the theory and fake news pointed to the mainstream media as conspirators of a coverup to protect what they said was a crime ring. Oct-Nov 2016 Nov 3rd, 2016 Nov 23rd, 2016 Dec 4th, 2016 2016-2017
  12. 12. Arizona State University Mining Misinformation in Social Media, November 21, 2017 12 Challenges in Dealing with Misinformation • Large-scale – Misinformation can be rampant • Dynamic – It can happen fast • Deceiving – Hard to verify • Homophily – Consistent with one’s beliefs
  13. 13. Arizona State University Mining Misinformation in Social Media, November 21, 2017 13 Overview of Today’s Tutorial • Introduction • Misinformation Detection • Misinformation in Social Media • Misinformation Spreader Detection • Resources 40 minutes 20 minutes 40 minutes 10 minutes 10 minutes
  14. 14. Arizona State University Mining Misinformation in Social Media, November 21, 2017 14 Misinformation Detection
  15. 15. Arizona State University Mining Misinformation in Social Media, November 21, 2017 15 Misinformation in Social Media: An Example
  16. 16. Arizona State University Mining Misinformation in Social Media, November 21, 2017 16 Misinformation in Social Media: An Example
  17. 17. Arizona State University Mining Misinformation in Social Media, November 21, 2017 17 Misinformation in Social Media: An Example
  18. 18. Arizona State University Mining Misinformation in Social Media, November 21, 2017 18 Misinformation in Social Media: An Example Misinformation Spreader Content of Misinformation • Text • Hashtag • URL • Emoticon • Image • Video (GIF) Context of Misinformation • Date, Time • Location Propagation of Misinformation • Retweet • Reply • Like
  19. 19. Arizona State University Mining Misinformation in Social Media, November 21, 2017 19 Overview of Misinformation Detection Misinformation Detection Content Context Propagation Early Detection Individual Message or Message Cluster + Supervised: Classification or Unsupervised: Anomaly Anomalous Time of Bursts Lack of Data Lack of Labels Who When How [1] Qazvinian et al. "Rumor has it: Identifying misinformation in microblogs." EMNLP 2011. [2] Castillo et al. "Predicting information credibility in time-sensitive social media." Internet Research 23.5 (2013). [3] Zubiaga et al. "Learning Reporting Dynamics during Breaking News for Rumour Detection in Social Media." [4] Wu et al. "Information Credibility Evaluation on Social Media." AAAI. 2016. [5] Wang et al. "Detecting rumor patterns in streaming social media." IEEE BigData, 2015. [6] Kwon et al. "Modeling Bursty Temporal Pattern of Rumors." ICWSM. 2014. [7] Wu et al. “Characterizing Social Media Messages by How They Propagate.“ WSDM 2018. [8] Ma et al. "Detect Rumors in Microblog Posts Using Propagation Structure via Kernel Learning." ACL 2017. [9] Sampson et al. "Leveraging the Implicit Structure within Social Media for Emergent Rumor Detection.“ CIKM 2016. [10] Wu et al. "Gleaning Wisdom from the Past: Early Detection of Emerging Rumors in Social Media.“ SDM 2017. [1, 2, 3, 4] [5, 6] [7, 8] [7] [8] [9] [10]
  20. 20. Arizona State University Mining Misinformation in Social Media, November 21, 2017 20 Feature Engineering on Content Text Feature Example Length of post #words, #characters Punctuation marks Question mark ? Exclamation! Emojis/Emoticons Angry face ;-L Sentiment Sentiment/swear/curse words Pronoun (1st, 2nd, 3rd) I, me, myself, my, mine URL, PageRank of domain Mention (@) Hashtag (#)
  21. 21. Arizona State University Mining Misinformation in Social Media, November 21, 2017 21 Misinformation Detection: Text Matching • Text Matching – Exact matching – Relevance • TF-IDF • BM25 – Semantic • Word2Vec • Doc2Vec • Drawbacks – Low Recall Starbird, Kate, et al. "Rumors, false flags, and digital vigilantes: Misinformation on twitter after the 2013 boston marathon bombing." iConference 2014 Proceedings (2014). Jin, Zhiwei, et al. "Detection and Analysis of 2016 US Presidential Election Related Rumors on Twitter." International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation. Springer, Cham, 2017. Fake News 1 Fake News 2 … Exact Duplication Similar Words Similar Representation Different Representation Relevance
  22. 22. Arizona State University Mining Misinformation in Social Media, November 21, 2017 22 Misinformation Detection: Supervised Learning • Message-based – A vector represents a tweet • Message cluster-based – A vector represents a cluster of tweets • Methods – Random Forest – SVM – Naïve Bayes – Decision Tree – Maximum Entropy – Logistic Regression Individual Posts Clusters of Posts Picking Data Picking A Method Radom Forest SVM …
  23. 23. Arizona State University Mining Misinformation in Social Media, November 21, 2017 23 Visual Content-based Detection • Diversity of Images Jin, Zhiwei, et al. "Novel visual and statistical image features for microblogs news verification." IEEE Transactions on Multimedia 19.3 (2017): 598-608. Texas Pizza Hut workers paddle through flood waters to deliver free pizzas by kayak There are sharks swimming in the streets of Houston during Hurricane Harvey
  24. 24. Arizona State University Mining Misinformation in Social Media, November 21, 2017 24 References • Starbird, Kate, et al. "Rumors, false flags, and digital vigilantes: Misinformation on twitter after the 2013 boston marathon bombing." iConference 2014 Proceedings (2014). • Jin, Zhiwei, et al. "Detection and Analysis of 2016 US Presidential Election Related Rumors on Twitter." International Conference on Social Computing, Behavioral- Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation. Springer, Cham, 2017. • Gupta, Aditi, and Ponnurangam Kumaraguru. "Credibility ranking of tweets during high impact events." Proceedings of the 1st workshop on privacy and security in online social media. ACM, 2012. • Yu, Suisheng, Mingcai Li, and Fengming Liu. "Rumor Identification with Maximum Entropy in MicroNet.“ • Yang, Fan, et al. "Automatic detection of rumor on sina weibo." Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics. ACM, 2012. • Zhang, Qiao, et al. "Automatic detection of rumor on social network." Natural Language Processing and Chinese Computing. Springer, Cham, 2015. 113-122. • Castillo, Carlos, Marcelo Mendoza, and Barbara Poblete. "Information credibility on twitter." Proceedings of the 20th international conference on World wide web. ACM, 2011. • Castillo, Carlos, Marcelo Mendoza, and Barbara Poblete. "Predicting information credibility in time-sensitive social media." Internet Research 23.5 (2013): 560-588. • Qazvinian, Vahed, et al. "Rumor has it: Identifying misinformation in microblogs." Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2011. • Wu, Shu, et al. "Information Credibility Evaluation on Social Media." AAAI. 2016. Text Matching Message- based Cluster- based
  25. 25. Arizona State University Mining Misinformation in Social Media, November 21, 2017 25 Modeling Message Sequence • The chronological order of messages is ignored • Messages are generated as a temporal sequence – Modeling posts as documents ignores the structural information
  26. 26. Arizona State University Mining Misinformation in Social Media, November 21, 2017 26 Modeling Post Sequence: Message-based • Message-based – Conditional Random Fields (CRF) Zubiaga, Arkaitz, Maria Liakata, and Rob Procter. "Learning Reporting Dynamics during Breaking News for Rumour Detection in Social Media." Linear Chain CRF
  27. 27. Arizona State University Mining Misinformation in Social Media, November 21, 2017 27 Modeling Post Sequence: Cluster-based • Message cluster-based – Recurrent Neural Networks Ma et al. "Detecting Rumors from Microblogs with Recurrent Neural Networks." IJCAI. 2016. Classifier Recurrent Neural Network Classifier Layer
  28. 28. Arizona State University Mining Misinformation in Social Media, November 21, 2017 28 Personalized Misinformation Detection (PCA) • Detecting anomalous content of a user with PCA • Main assumption – Misinformation likely to be eccentric to normal content of a user • Detecting misinformation as content outliers – Tweet-based modeling – Measure distance between a new message with historical data Zhang, Yan, et al. "A distance-based outlier detection method for rumor detection exploiting user behaviorial differences." Data and Software Engineering (ICoDSE), 2016 International Conference on. IEEE, 2016. New Post New Post Historical Posts Measure Distance
  29. 29. Arizona State University Mining Misinformation in Social Media, November 21, 2017 29 Personalized Misinformation Detection (Autoencoder) • Detecting anomalous content of a user with Autoencoder • Multi-layer Autoencoder – Train an autoencoder with historical data – To test a message: • Feed it to the autoencoder • Obtain the reconstructed data • Calculate distance between the original and the reconstruction Zhang, Yan, et al. "Detecting rumors on Online Social Networks using multi-layer autoencoder." Technology & Engineering Management Conference (TEMSCON), 2017 IEEE. IEEE, 2017.
  30. 30. Arizona State University Mining Misinformation in Social Media, November 21, 2017 30 Detecting Misinformation with Context Context of Misinformation • Date, Time • Location
  31. 31. Arizona State University Mining Misinformation in Social Media, November 21, 2017 31 Peak Time of Misinformation Misinformation on Twitter Kwon et al. "Modeling Bursty Temporal Pattern of Rumors." ICWSM. 2014.Friggeri et al. "Rumor Cascades." ICWSM. 2014. Misinformation on Facebook • Rebirth of misinformation – Misinformation has multiple peaks over time – True information has only one
  32. 32. Arizona State University Mining Misinformation in Social Media, November 21, 2017 32 Detecting Misinformation with Propagation Propagation of Misinformation • Retweet • Reply • Like
  33. 33. Arizona State University Mining Misinformation in Social Media, November 21, 2017 33 • Misinformation is spread by similar users – Bot army – Echo chamber of misinformation • Intuition: misinformation can be distinguished by who spreads it, and how it is spread • Challenges – Users may change accounts (bot army) – Data sparsity Detecting Misinformation with Propagation
  34. 34. Arizona State University Mining Misinformation in Social Media, November 21, 2017 34 Detecting Misinformation with Propagation • User Embedding: • Message Classification Liang Wu, Huan Liu. “Characterizing Social Media Messages by How They Propagate." WSDM 2018. B C A D Users Embed User Representations Posts Networks Community B C A D Propagation Pathways Sequence Modeling A B C D A B D Classifier
  35. 35. Arizona State University Mining Misinformation in Social Media, November 21, 2017 35 Key Issue for Misinformation Detection • April 2013, AP tweeted, Two Explosions in the White House and Barack Obama is injured – Truth: Hackers hacked the account – However, it tipped stock market by $136 billion in 2 minutes
  36. 36. Arizona State University Mining Misinformation in Social Media, November 21, 2017 36 Early Detection of Misinformation: Challenges • Challenges of early detection –Message cluster based methods • Lack of data –Supervised learning methods • Lack of labels
  37. 37. Arizona State University Mining Misinformation in Social Media, November 21, 2017 37 Early Detection Challenge I: Lack of Data • Lack of data • Early stage: few posts sparsely scattered • Most methods prove effective in a later stage Early Stage Later Stage
  38. 38. Arizona State University Mining Misinformation in Social Media, November 21, 2017 38 Early Detection: Lack of Data • Linking scattered messages – Clustering messages – Merge individual messages • Hashtag linkage • Web Linkage Sampson et al. "Leveraging the implicit structure within social media for emergent rumor detection." CIKM 2016. Hashtag Web Link
  39. 39. Arizona State University Mining Misinformation in Social Media, November 21, 2017 39 Early Detection Challenge II: Lack of Labels • Lack of labels – Traditional text categories • Articles within the same category share similar vocabulary and writing styles • E.g., sports news are similar to each other – Misinformation is heterogeneous • Two rumors are unlikely to be similar to each other Rumor about Presidential Election Rumor about Ferguson Protest
  40. 40. Arizona State University Mining Misinformation in Social Media, November 21, 2017 40 Early Detection (II): Lack of Labels • Utilize user responses from prior misinformation – Clustering misinformation with similar responses – Selecting effective features shared by a cluster Post #1: “Can't fix stupid but it can be blocked” Post #2: “So, when did bearing false witness become a Christian value?” Post #3: “Christians Must Support Trump or Face Death Camps. Does he still claim to be a Christian?” Post #1: “i've just seen the sign on fb. you can't fix stupid” Post #2: “THIS IS PURE INSANITY. HOW ABOUT THIS STATEMENT” Post #3: “No Mother Should Have To Fear For Her Son's Life Every Time He Robs A Store” Wu et al. "Gleaning Wisdom from the Past: Early Detection of Emerging Rumors in Social Media.“ SDM 2017
  41. 41. Arizona State University Mining Misinformation in Social Media, November 21, 2017 41 Early Detection Results: Lack of Data • Effectiveness of linkage Classification without linkage Classification with hashtag linkage Classification with web linkage
  42. 42. Arizona State University Mining Misinformation in Social Media, November 21, 2017 42 Early Detection Results: Lack of Labels • Effectiveness of linkage Effectiveness of different methods over time Results at an early stage (2 hours)
  43. 43. Arizona State University Mining Misinformation in Social Media, November 21, 2017 43 Overview of Misinformation Detection Misinformation Detection Content Context Propagation Early Detection Individual Message or Message Cluster + Supervised: Classification or Unsupervised: Anomaly Anomalous Time of Bursts Lack of Data Lack of Label Who When How [1] Qazvinian et al. "Rumor has it: Identifying misinformation in microblogs." EMNLP 2011. [2] Castillo et al. "Predicting information credibility in time-sensitive social media." Internet Research 23.5 (2013). [3] Zubiaga et al. "Learning Reporting Dynamics during Breaking News for Rumour Detection in Social Media." [4] Wu et al. "Information Credibility Evaluation on Social Media." AAAI. 2016. [5] Wang et al. "Detecting rumor patterns in streaming social media." IEEE BigData, 2015. [6] Kwon et al. "Modeling Bursty Temporal Pattern of Rumors." ICWSM. 2014. [7] Wu et al. “Characterizing Social Media Messages by How They Propagate.“ WSDM 2018. [8] Ma et al. "Detect Rumors in Microblog Posts Using Propagation Structure via Kernel Learning." ACL 2017. [9] Sampson et al. "Leveraging the Implicit Structure within Social Media for Emergent Rumor Detection.“ CIKM 2016. [10] Wu et al. "Gleaning Wisdom from the Past: Early Detection of Emerging Rumors in Social Media.“ SDM 2017. [1, 2, 3, 4] [5, 6] [7, 8] [7] [8] [9] [10]
  44. 44. Arizona State University Mining Misinformation in Social Media, November 21, 2017 44 Spread of Misinformation
  45. 45. Mining Misinformation in Social Media Giovanni Luca Ciampaglia glciampaglia.com ICDM 2017, New Orleans, Nov 21, 2017
  46. 46. ➢ What is Misinformation and Why it Spreads on Social Media ➢ Modeling the Spread of Misinformation ➢ Open Questions ○ What techniques are used to boost misinformation? Introduction
  47. 47. ➢ What is Misinformation and Why it Spreads ➢ Modeling the Spread of Misinformation ➢ Open Questions ○ What techniques are used to boost misinformation? Introduction
  48. 48. Pheme ❖ Wartime studies, types of rumors (e.g., pipe dreams) [Knapp 1944, Allport & Pottsman, 1947] ❖ “Demand” for Improvised News [Shibutani, 1968] ❖ Two-step information diffusion [Katz & Latzarsfeld, 1955] ❖ Reputation exchange [Rosnow & Fine, 1976] ❖ Collective Sensemaking, Watercooler effect [Bordia & DiFonzo 2004] Swift is her walk, more swift her winged haste: A monstrous phantom, horrible and vast. As many plumes as raise her lofty flight, So many piercing eyes inlarge her sight; Millions of opening mouths to Fame belong, And ev'ry mouth is furnish'd with a tongue, And round with list'ning ears the flying plague is hung. Aeneid, Book IV Publij Virgilij maronis opera cum quinque vulgatis commentariis Seruii Mauri honorati gram[m]atici: Aelii Donati: Christofori Landini: Antonii Mancinelli & Domicii Calderini, expolitissimisque figuris atque imaginibus nuper per Sebastianum Brant superadditis, exactissimeque revisis atque elimatis, Straßburg: Grieninger 1502.
  49. 49. Source: “Fake News. It’s Complicated”. First Draft News medium.com/1st-draft
  50. 50. hoaxy.iuni.iu.edu Query: “three million votes illegal aliens”
  51. 51. Echo chambers What is the role of online social networks and social media in fostering echo chambers, filter bubbles, segregation, polarization? Adamic & Glance (2005), [Blogs] Conover et al. (2011), [Twitter]
  52. 52. Recap: What is misinformation and why it spreads ❖ Misinformation has always existed ❖ Social media disseminate (mis)information very quickly ❖ Echo chambers insulate people from fact-checking and verifications
  53. 53. ➢ What is Misinformation and How it Spreads ➢ Modeling the Spread of Misinformation ➢ Open Questions ○ What techniques are used to boost misinformation? Introduction
  54. 54. ❖ Compartmental models (SI, SIR, SIS, etc.) [Kermack and McKendrick, 1927] ❖ Rumor spreading models (DK, MT) [Daley and Kendall 1964, Maki 1973] ❖ Independent Cascades Model [Kempe et al., 2005] ❖ Threshold Model, Complex Contagion [Granovetter 1979, Centola 2010] Models of Information Diffusion Pi (m) ∝ Pi (m) ∝ f ( i ) f monotonically increasing Probability of adopting a “meme” at the i-th exposure
  55. 55. Simple vs Complex Contagion Complex contagion: strong concentration of communication inside communities Simple contagion: weak concentration ❖ Most memes spread like complex contagion ❖ Viral memes spread across communities more like diseases (simple contagion) Weng et al. (2014) [Twitter]
  56. 56. Weng et al. (2014), Nature Sci. Rep.
  57. 57. Role of the social network and limited attention ❖ Spread among agents with limited attention on social network is sufficient to explain virality patterns ❖ Not necessary to invoke more complicated explanations based on intrinsic meme quality Weng et al. (2014), Nature Sci. Rep.
  58. 58. Can the best ideas win? α{ P(m) ∝ (m) f fitness function
  59. 59. Discriminative Power (Kendall’s Tau) When do the best ideas win? High Quality Low Quality
  60. 60. Recap: Models of the Spread of Misinformation ❖ Simple vs complex contagion ❖ More realistic features ➢ Agents have limited attention ➢ Social network structure ➢ Competition between different memes ❖ Tradeoff between information quality and diversity
  61. 61. ➢ What is Misinformation and Why it Spreads ➢ Modeling the Spread of Misinformation ➢ Open Questions ○ What techniques are used to boost misinformation? Introduction
  62. 62. Bots are super spreaders Shao et al. 2017 (CoRR)
  63. 63. Bots are strategic Shao et al. 2017 (CoRR)
  64. 64. Bots are effective Shao et al. 2017 (CoRR)
  65. 65. Conclusions ❖ What is misinformation and why it spreads ➢ Online it spreads through a mix social, cognitive, and algorithmic biases. ❖ Models the spread of misinformation ➢ Social network structure, limited attention, and information overload make us vulnerable to misinformation. ❖ Open Questions: ➢ Bots are strategic superspreaders ➢ They are effective at spreading misinformation. ❖ Tools to detect manipulation of public opinions may be first steps toward a trustworthy Web.
  66. 66. Thanks! cnets.indiana.edu iuni.iu.iedu Marcella Tambuscio
  67. 67. WMF Research Showcase, August 17, 2016 Giovanni Luca Ciampaglia gciampag@indiana.edu
  68. 68. Recap: Open Questions ❖ Social Bots Amplify Misinformation ➢ Through social reinforcement ➢ Early amplification ➢ Target humans, possibly “influentials”
  69. 69. afterbefore (days) Demand and Supply of Information 2012 London Olympics [Wikipedia] Ciampaglia et al., Sci. Rep. (2015) London England Usain Bolt Olympics Medal 2012 London Olympics
  70. 70. Supply of and Demand for Information ❖ Production of information is associated to shifts in collective attention ❖ Evidence that attention precedes production ❖ Higher demand → higher price → more production Ciampaglia et al. Scientific Reports 2015 source: Wikipedia
  71. 71. Predicting Virality Structural Trapping Social Reinforcement Homophily M1: random sampling model M2: random cascading model (structural trapping) M3: social reinforcement model (structural trapping + social reinforcement) M4: homophily model (structural trapping + homophily) SIMPLE Contagion COMPLEX Contagion Weng et al. (2014), [Twitter]
  72. 72. Virality and Competition for Attention User Popularity # followers [Yahoo! Meme] Hashtag Popularity # daily retweets [Twitter] 2B Views 55M Followers
  73. 73. Low Quality Information just as Likely to Go Viral Source: Emergent.info [FB shares]
  74. 74. Arizona State University Mining Misinformation in Social Media, November 21, 2017 45 Misinformation Spreader Detection
  75. 75. Arizona State University Mining Misinformation in Social Media, November 21, 2017 46 Misinformation in Social Media: An Example Misinformation Spreader Content of Misinformation • Text • Hashtag • URL • Emoticon • Image • Video (GIF) Context of Misinformation • Date, Time • Location Propagation of Misinformation • Retweet • Reply • Like
  76. 76. Arizona State University Mining Misinformation in Social Media, November 21, 2017 47 Detecting Misinformation by Its Spreaders Misinformation Spreader • A large portion of OSN accounts are likely to be fake – Facebook: 67.65 million – 137.76 million – Twitter: 48 million
  77. 77. Arizona State University Mining Misinformation in Social Media, November 21, 2017 48 A Misinformation Spreader • Misinformation spreaders: users that deliberately spread misinformation to mislead others A phishing link to Twivvter.com
  78. 78. Arizona State University Mining Misinformation in Social Media, November 21, 2017 49 Types of Misinformation Spreaders • Spammers • Fraudsters • Trolls • Crowdturfers • … Misinformation Fake News Rumor Spam Spammer Fraudster Clickbait …
  79. 79. Arizona State University Mining Misinformation in Social Media, November 21, 2017 50 Features for Capturing a Spreader • What can be used to detect a spammer? – Profile – Posts (Text) – Friends (Network) Profile Features Post Features Network Features
  80. 80. Arizona State University Mining Misinformation in Social Media, November 21, 2017 51 • Extracting features from a profile – #followers, #followees • E.g., small #followers  suspicious account – Biography, registration time, screen name, etc. Feature Engineering: Profile Profile Features
  81. 81. Arizona State University Mining Misinformation in Social Media, November 21, 2017 52 • Extracting text features from user posts – Text: BoW, TF-IDF, etc. Feature Engineering: Text … A textual feature vector:
  82. 82. Arizona State University Mining Misinformation in Social Media, November 21, 2017 53 • Extracting network features – Network: Adjacency matrix, number of follower, follower/followee ratio, centrality Feature Engineering: Network Adjacency matrix:
  83. 83. Arizona State University Mining Misinformation in Social Media, November 21, 2017 54 Overview: Misinformation Spreader Detection [1] Jindal, Nitin, and Bing Liu. "Review spam detection." Proceedings of the 16th international conference on World Wide Web. ACM, 2007. [2] Hu, X., Tang, J., Zhang, Y. and Liu, H., 2013, August. Social Spammer Detection in Microblogging. In IJCAI. [3] Song, Yuqi, et al. "PUD: Social Spammer Detection Based on PU Learning.“ International Conference on Neural Information Processing. Springer, Cham, 2017. [4] Wu L, Hu X, Morstatter F, Liu H. Adaptive Spammer Detection with Sparse Group Modeling. ICWSM 2017 (pp. 319-326). [5] Wu, Liang, et al. "Detecting Camouflaged Content Polluters." ICWSM. 2017. [6] Hooi, Bryan, et al. "Fraudar: Bounding graph fraud in the face of camouflage." Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016. Content Network Text Mining [1] Content Network Camouflage Content Text + Graph Mining [2, 3] Data Methods Instance (Post/User) Selection [4, 5, 6]
  84. 84. Arizona State University Mining Misinformation in Social Media, November 21, 2017 55 Supervised Learning: Content + Network … A textual feature vector: Adjacency matrix: Profile Features Post Features Network Features • Features for supervised learning – Text features – Network features
  85. 85. Arizona State University Mining Misinformation in Social Media, November 21, 2017 56 Traditional Approach: Content Modeling … A textual feature vector: Positive and negative accounts can be distinguished with text : Coefficients to be estimated • Supervised learning with text features
  86. 86. Arizona State University Mining Misinformation in Social Media, November 21, 2017 57 Traditional Approach: Network Modeling … A textual feature vector: Adjacency matrix: • Supervised learning with network features Friends are likely to have the same label : Coefficients to be estimated
  87. 87. Arizona State University Mining Misinformation in Social Media, November 21, 2017 58 Emerging Challenge: Camouflage • Content Camouflage – Copy content from legitimate users – Exploit compromised account • Network Camouflage – Link farming with other spreaders, bots – Link farming with normal users
  88. 88. Arizona State University Mining Misinformation in Social Media, November 21, 2017 59 Challenge (I): Camouflage • In order to avoid being detected: – manipulate the text feature vector • posting content similar to regular users’ – manipulate the adjacency matrix • harvesting links with other users Positive and negative accounts can be distinguished with text Friends are likely to have the same label
  89. 89. Arizona State University Mining Misinformation in Social Media, November 21, 2017 60 Challenge (II): Network Camouflage • Heuristic-based methods • #Followers • Follower/followee ratio • Anomaly detection
  90. 90. Arizona State University Mining Misinformation in Social Media, November 21, 2017 61 Challenge III: Limited Label Information • Label a malicious account (positive) – Suspended accounts – Honeypot • A set of bots created to lure other bots in the wild. • Any user that follow them is a bot. • The assumption is that normal users can easily recognize them. • Lack of labeled camouflage Honeypots
  91. 91. Arizona State University Mining Misinformation in Social Media, November 21, 2017 62 Camouflage • Prior assumptions: – All suspended accounts are misinformation spreaders – All posts of a spreader are malicious • Selecting a subset of users for training • Selecting a subset of posts for training Positive and negative accounts can be distinguished with text Friends are likely to have the same label
  92. 92. Arizona State University Mining Misinformation in Social Media, November 21, 2017 63 Selecting Users for Training Select a subset of users for training Evaluate with a validation set Update the training set • How to select the optimal set for training? Wu et al. Adaptive Spammer Detection with Sparse Group Modeling. ICWSM 2017
  93. 93. Arizona State University Mining Misinformation in Social Media, November 21, 2017 64 Relaxation I: Group Structure • Assumption: malicious accounts cannot join a legitimate community – organize users in groups – users in the same group should be similarly weighted 𝑮 𝟏 𝟎 𝑮 𝟏 𝟏 𝑮 𝟑 𝟏 {1, 2, 3, 4, 5, 6, 7} {1, 2, 3, 4} {6, 7}{5} 𝑮 𝟐 𝟏 𝟓 6 7𝑮 𝟏 𝟐 {1, 2} 1 2 𝑮 𝟐 𝟐 {3, 4} 3 4 Wu et al. Adaptive Spammer Detection with Sparse Group Modeling. ICWSM 2017
  94. 94. Arizona State University Mining Misinformation in Social Media, November 21, 2017 65 Relaxation II: Weighted Training 𝐰,𝐜 𝐦𝐢𝐧 ෍ 𝒊=𝟏 𝑵 ci 𝐱𝐢 𝐰 − 𝐲𝒊 𝟐 + λ1||w||2 2 𝐒𝐮𝐛𝐣𝐞𝐜𝐭 𝐭𝐨 ෍ 𝒊 𝒄𝒊 = 𝑲 0 < 𝒄𝒊< 1 + λ2σi=0 d σj=1 ni ||𝐜Gj i||2 𝑔𝑟𝑜𝑢𝑝 𝐿𝑎𝑠𝑠𝑜𝑎𝑣𝑜𝑖𝑑 𝑜𝑣𝑒𝑟𝑓𝑖𝑡𝑡𝑖𝑛𝑔 𝑮 𝟏 𝟎 𝑮 𝟏 𝟏 𝑮 𝟑 𝟏 {1, 2, 3, 4, 5, 6, 7} {1, 2, 3, 4} {5} 𝑮 𝟐 𝟏 𝟓 6 7𝑮 𝟏 𝟐 {1, 2} 1 2 𝑮 𝟐 𝟐 {3, 4} 3 4 L1 norm on the inter-group level L2 norm on the intra-group level d: depth of hierarchy of Louvain method ni: number of groups on layer i 𝐜Gj i: nodes of group j on layer i Wu et al. Adaptive Spammer Detection with Sparse Group Modeling. ICWSM 2017
  95. 95. Arizona State University Mining Misinformation in Social Media, November 21, 2017 66 Optimization 𝐰,𝐜 𝐦𝐢𝐧 ෍ 𝒊=𝟏 𝒎 ci 𝐱𝐢 𝐰 − 𝐲𝐢 𝟐 + λ1||w||2 2 𝐒𝐮𝐛𝐣𝐞𝐜𝐭 𝐭𝐨 ෍ 𝒊 𝒄𝒊 = 𝟏 ci: weight of i 𝐱i: an attribute vector of i 𝐰: coefficients of linear regression 𝐲𝒊: Label of instance i ||𝐰||2 2: avoiding overfitting d: depth of hierarchy of Louvain method ni: number of groups on layer i + λ2σi=0 d σj=1 ni ||𝐜Gj i||2 N: number of instances 𝐜Gj i: nodes of group j on layer i 𝐺𝑟𝑜𝑢𝑝 𝐿𝑎𝑠𝑠𝑜 𝐰 𝐦𝐢𝐧 ෍ 𝒊=𝟏 𝒎 ci 𝐱𝐢 𝐰 − 𝐲𝐢 𝟐 + λ1||w||2 2 Optimize w Wu et al. Adaptive Spammer Detection with Sparse Group Modeling. ICWSM 2017
  96. 96. Arizona State University Mining Misinformation in Social Media, November 21, 2017 67 Optimization 𝐰,𝐜 𝐦𝐢𝐧 ෍ 𝒊=𝟏 𝒎 ci 𝐱𝐢 𝐰 − 𝐲𝐢 𝟐 + λ1||w||2 2 𝐒𝐮𝐛𝐣𝐞𝐜𝐭 𝐭𝐨 ෍ 𝒊 𝒄𝒊 = 𝟏 ci: weight of i 𝐱i: an attribute vector of i 𝐰: coefficients of linear regression 𝐲𝒊: Label of instance i ||𝐰||2 2: avoiding overfitting d: depth of hierarchy of Louvain method ni: number of groups on layer i + λ2σi=0 d σj=1 ni ||𝐜Gj i||2 N: number of instances 𝐜Gj i: nodes of group j on layer i 𝐺𝑟𝑜𝑢𝑝 𝐿𝑎𝑠𝑠𝑜 𝐰,𝐜 𝐦𝐢𝐧 ෍ 𝒊=𝟏 𝒎 ci 𝑡𝒊 + λ2σi=0 d σj=1 ni ||𝐜Gj i||2 Optimize c Wu et al. Adaptive Spammer Detection with Sparse Group Modeling. ICWSM 2017
  97. 97. Arizona State University Mining Misinformation in Social Media, November 21, 2017 68 Experimental Results 1 Hu et al., “Social spammer detection in microblogging.”, IJCAI’13 2 Ye et al., “Discovering opinion spammer groups by network footprints.”, ECML-PKDD’15 Approaches Precision Recall F-Score SSDM1 92.15% 92.00% 92.07% NFS2 88.16% 65.67% 75.27% SGASD 93.75% 96.92% 95.31% • Collecting data with honeypots – http://infolab.tamu.edu/data/ Tweets Users ReTweets Links Spammers 4,453,380 38,400 223,115 8,739,105 19,200
  98. 98. Arizona State University Mining Misinformation in Social Media, November 21, 2017 69 Content Camouflage • Basic assumption for traditional methods: – All content of a misinformation spreader is malicious • Content camouflage: posts of a misinformation spreader may be legitimate – Copy content from legitimate users – Exploit compromised accounts
  99. 99. Arizona State University Mining Misinformation in Social Media, November 21, 2017 70 Content Camouflage: An Example A normal post A normal post
  100. 100. Arizona State University Mining Misinformation in Social Media, November 21, 2017 71 Challenge: Lack of Labeled Data • Labels of camouflage are costly to collect
  101. 101. Arizona State University Mining Misinformation in Social Media, November 21, 2017 72 Learning to Identify Camouflage • Assumption: posts of misinformation spreaders are mixed with normal and malicious. • Introduce a weight for each post label. • Select posts that distinguish between misinformation spreaders and normal users.
  102. 102. Arizona State University Mining Misinformation in Social Media, November 21, 2017 73 Learning to Identify Camouflage: Formulation Wu et al. Detecting Camouflaged Content Polluters. ICWSM 2017
  103. 103. Arizona State University Mining Misinformation in Social Media, November 21, 2017 74 Experimental Results • Findings: – Sophisticated misinformation spreaders first disguise, and then do harm. • Results: Wu et al. Detecting Camouflaged Content Polluters. ICWSM 2017
  104. 104. Arizona State University Mining Misinformation in Social Media, November 21, 2017 75 Misinformation Spreader Detection [1] Jindal, Nitin, and Bing Liu. "Review spam detection." Proceedings of the 16th international conference on World Wide Web. ACM, 2007. [2] Hu, X., Tang, J., Zhang, Y. and Liu, H., 2013, August. Social Spammer Detection in Microblogging. In IJCAI. [3] Song, Yuqi, et al. "PUD: Social Spammer Detection Based on PU Learning.“ International Conference on Neural Information Processing. Springer, Cham, 2017. [4] Wu L, Hu X, Morstatter F, Liu H. Adaptive Spammer Detection with Sparse Group Modeling. ICWSM 2017 (pp. 319-326). [5] Wu, Liang, et al. "Detecting Camouflaged Content Polluters." ICWSM. 2017. [6] Hooi, Bryan, et al. "Fraudar: Bounding graph fraud in the face of camouflage." Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016. Content Network Text Mining [1] Content Network Camouflage Content Text + Graph Mining [2, 3] Data Methods Instance (Post/User) Selection [4, 5, 6]
  105. 105. Arizona State University Mining Misinformation in Social Media, November 21, 2017 76 Challenges in Dealing with Misinformation • Large-scale – Misinformation can be rampant • Dynamic – It can happen fast • Deceiving – Hard to verify • Homophily – Consistent with one’s beliefs
  106. 106. Arizona State University Mining Misinformation in Social Media, November 21, 2017 77 Codes, Platforms and Datasets
  107. 107. Arizona State University Mining Misinformation in Social Media, November 21, 2017 78 Platforms • TweetTracker: Detecting Topic-centric Bots • Hoaxy: Tracking Online Misinformation • Botometer: Detecting Bots on Twitter
  108. 108. Arizona State University Mining Misinformation in Social Media, November 21, 2017 79 Fact-Checking Websites • Online Fact-checking websites – PolitiFact: http://www.politifact.com/ – Truthy: http://truthy.indiana.edu/ – Snopes: http://www.snopes.com/ – TruthOrFiction: https://www.truthorfiction.com/ – Weibo Rumor: http://service.account.weibo.com/ • Volunteering committee
  109. 109. Arizona State University Mining Misinformation in Social Media, November 21, 2017 80 Code and Data Repositories • Honeypot: http://bit.ly/ASUHoneypot • Identification: https://veri.ly/ • Diffusion: – Python Networkx: https://networkx.github.io/ – Stanford SNAP: http://snap.stanford.edu/ • Datasets – http://socialcomputing.asu.edu/pages/datasets – http://bit.ly/asonam-bot-data – https://github.com/jsampso/AMNDBots – http://carl.cs.indiana.edu/data/#fact-checking – http://snap.stanford.edu/data/index.html
  110. 110. Arizona State University Mining Misinformation in Social Media, November 21, 2017 81 Book Chapters • “Mining Misinformation in Social Media’’, Chapter 5 in Big Data in Complex and Social Networks • http://bit.ly/2AYr5KM • “Detecting Crowdturfing in Social Media’’, Encyclopedia of Social Network Analysis and Mining • http://bit.ly/2hE6LXE
  111. 111. Arizona State University Mining Misinformation in Social Media, November 21, 2017 82 Twitter Data Analytics • Common tasks in mining Twitter Data. – Free Download with Code & Data – Collection – Analysis – Visualization tweettracker.fulton.asu.edu/tda/
  112. 112. Arizona State University Mining Misinformation in Social Media, November 21, 2017 83 Social Media Mining • Social Media Mining: An Introduction – a textbook • A comprehensive coverage of social media mining techniques – Free Download – Network Measures and Analysis – Influence and Diffusion – Community Detection – Classification and Clustering – Behavior Analytics http://dmml.asu.edu/smm/
  113. 113. Arizona State University Mining Misinformation in Social Media, November 21, 2017 84 Challenges in Dealing with Misinformation • Large-scale – Misinformation can be rampant • Dynamic – It can happen fast • Deceiving – Hard to verify • Homophily – Consistent with one’s beliefs
  114. 114. Arizona State University Mining Misinformation in Social Media, November 21, 2017 85 Q&A • Liang Wu, Giovanni Luca Ciampaglia, Huan Liu • {wuliang, huanliu}@asu.edu • All materials and resources are available online: • Giovanni Luca Ciampaglia • gciampag@indiana.edu http://bit.ly/ICDMTutorial
  115. 115. Arizona State University Mining Misinformation in Social Media, November 21, 2017 86 Acknowledgements • DMML @ ASU • NaN @ IUB • MINERVA initiative through the ONR N000141310835 on Multi-Source Assessment of State Stability

×