Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Twitter Analysis: Fake News


Published on

This is our team presentation in Archived Unleashed 3.0 in San Francisco. Our team work on twitter data about the U.S. second presidential debate in October 2016.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Twitter Analysis: Fake News

  1. 1. Twitter Analysis: Fake News Allison Hegel, Liuqing Li, Dallas Pillen, Erika Siregar, Melanie Walsh
  2. 2. Our Project Question Is it “fake news” to misquote a presidential candidate by just one word? What about two? Three? When exactly does “fake” news become fake? Hypothesis “Fake news” doesn’t only happen from the top down, but also happens at the very first moment of interpretation, especially when shared on social media networks Goals ● How Twitter users were recording, interpreting, and sharing the words spoken by Donald Trump and Hillary Clinton in real time. ● Find out how the “facts” (the accurate transcription of the words) began to evolve into counterfacts or alternate versions of their words. ● Find out if there is any interpretive bias and emotional valence in the tweets. 2
  3. 3. Dataset Data Type: Tweets (~740,000 unique tweets) Source: Social Feed Manager Time Range: During and immediately after the Second Presidential Debate (10/01/2016) Search terms: #debate, #debates, #debatenight, #debate2016, #debates2016, @HillaryClinton, @realDonaldTrump, @debates 3
  4. 4. The Data Processing Create Collections Filter the json data based on several memorable debate quotes/topics Collection 1 Quotes: “That was locker room talk.” Keywords: locker room, locker-room, lockerroom Collection 2 Quotes: “Nobody has more respect for women than I do.” Keywords: respect for women Collection 3 Quotes: “You would be in the jail.” Keywords: jail Collection 4 Quotes: “You need both a public and private position on certain issues.” Keywords: public position, private position 4
  5. 5. Processing the Data Pre-processing change into lowercase remove hashtags, mentions, URLs remove stopwords Tweet Variance use TF-IDF (scikit-learn) to create the term vectors calculate the cosine similarity among selected tweets Sentiment calculate the sentiment value (nltk.sentiment.vader) Topic Analysis create topics in each collection (# of topics: 3, # of words / topic: 8) (gensim) 5
  6. 6. Results ● Word trees showing quote and response variance 6
  7. 7. Results First topic in each collection Sentiments in each collection locker room jail respect for women 7
  8. 8. Most Positive Tweets 8
  9. 9. Conclusion ● Twitter users were recording, interpreting, and sharing the words spoken by Donald Trump and Hillary Clinton in real time -- often with some variation or comment ○ Sarcastic/insincere comments likely skewed sentiment analysis ● Further research would require improving the methods for cleaning the data, analyzing the ways that quotes changed over a longer period of time, how those interpretations were reflected in other outlets, and how influential variances and interpretive biases were in shaping public understanding of what the candidates said compared to deliberate “fake news” 9