Diving into Twitter data on consumer electronic brands
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Diving into Twitter data on consumer electronic brands

  • 161 views
Uploaded on

Which consumer electronic brands get tweeted about most? Which brands have more positive/negative sentiment? To find out, 15.3 gb of tweets was downloaded from 13 - 25 May using Python and then......

Which consumer electronic brands get tweeted about most? Which brands have more positive/negative sentiment? To find out, 15.3 gb of tweets was downloaded from 13 - 25 May using Python and then analysed in R.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
161
On Slideshare
134
From Embeds
27
Number of Embeds
2

Actions

Shares
Downloads
3
Comments
0
Likes
0

Embeds 27

https://www.linkedin.com 26
http://www.linkedin.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Diving into Twitter dataon consumer electronic brands
  • 2. Which brands get tweeted about most? Is it mainly positive or negative?
  • 3. 15.3 gbof JSON data downloaded from Twitter’s Streaming API between 13 –25 May using Python
  • 4. Before processing, tweets were in raw JSON format Time Created Tweet text/status Username Tweet location (if available) No. of followers No. of people followed No. of statuses Language Data should be optimized as only a fraction of the data used for analysis— optimization improves performance in models and saves cost and time
  • 5. The same tweet we saw previously By optimizing the data, 15.3 gbof jsonwas converted to 757 mbof csv (5% of original size) After processing, only some fields retained and converted to CSV format
  • 6. Brand Positive Sentiment Brand Negative Sentiment Brand Mixed Sentiment The list of words for sentiment analysis is adapted from the Harvard General Inquirer dictionaries Source: http://www.wjh.harvard.edu/~inquirer/homecat.htm, downloaded on 28 May 2014 Tweets are then tagged for brand and sentiment in R
  • 7. Initially, collected tweets based on 17 keywords Samsung S4 Xperia HTC Huawei BlackBerry Apple S5 Sony Nokia Note 3 Lumia q5 iPhone q10 z10 Motorala
  • 8. “Apple” and “iPhone” accounted for 87% of tweet volume Removed from keywords during actual data collection to focus on other brands (, save space, and reduce bandwidth usage) A trial was conducted with 16 keywords on 11 May, 8 –9am 1 gbof JSON data was collected in a hour During a one hour trial, “Apple” and “iPhone” had 87% share of tweets
  • 9. Samsung Sony Nokia HTC Huawei BlackBerry Motorola Tweets containing seven keywords were collected from 13 –25 May
  • 10. 4% of tweets mentioned > 2 brands; they were excluded from analysis 8% of tweets had mixed sentiment (i.e., positive and negative sentiment); they were excluded from analysis 92% of tweets remained, each only mentioning 1 brand with either “positive”, “negative”, or “neutral” sentiment 3,681,942 tweets were collected After processing, 3,234,678 tweets remained for analysis
  • 11. Samsung leads in twitter buzz, followed by Sony and Nokia Together, they make up 75% of twitter buzz Samsung is the clear leader in twitter buzz, followed by Sony and Nokia However, Samsung and Sony have wider product offerings relative to the rest that mainly focus on phones Also, Huawei’s users may mainly be on Weibo, Renren, etc
  • 12. Most brands have roughly 1:1 ratio of positive to negative tweets Samsung is the exception with ratio of roughly 3:2 Brands have equal ratio of positive to negative tweets
  • 13. Dip due to connectivity issues Brands’ share of tweets is roughly consistent over time
  • 14. Spikes in tweet volume coincide with product launches
  • 15. Spikes in tweet volume coincide with product launches
  • 16. Users who tweet about BlackBerry tend to be better connected (i.e., higher median of followers and people followed)* * Excluding outliers Across brands, there is not much difference in user connectedness The median user has around 250 followers and also follows 250 people
  • 17. 50th–75thpercentile of users who tweet about Sony, HTC, and Motorola have very high numbers of all time tweets (spam bots perhaps?)* While Nokia is 3rdin twitter buzz share (14%), users who tweet about Nokia have least numbers of all time tweets Suggests that tweets likely to come from real users and not bots (or maybe less active bots) * Excluding outliers However, there is a large difference between users’ all time tweets
  • 18. 12833979 followers 11796709 followers CNN’s tweet on Obama’s BlackBerry was “seen” by most followers
  • 19. 1753696 tweets 1730006 tweets A bot that retweets on farts has the highest all time tweets
  • 20. 1753696 tweets 1730006 tweets A bot that retweets on farts has the highest all time tweets
  • 21. Initially, BlackBerry tweets showed 100% negative sentiment Culprit was the word “lack”—it was removed However, removing it reduced negative sentiment for other brands by 2 –3 % An interesting error led to BlackBerry having 100% negative sentiment
  • 22. Track brands’ managed twitter accounts and conversations to measure engagement Which brands have better engagement with users and why? Track general message of tweets Are tweets of a brand mainly about sales, reviews, complaints, or news? Network analysis to identify users with high centrality and influence Which users have high influence and what are they tweeting about my brand? Geospatial analysis of tweets Are there differences in brand buzz, sentiment, and engagement across regions? Where do we go from here?
  • 23. Code available on GitHub: https://github.com/eugeneyan/Twitter-SMA Python script to download tweets in JSON format Python scripts to convert tweets from JSON to CSV (with & without regular expressions filtering) R script and sentiment analysis list of words R script and sentiment analysis list of words to reproduce BlackBerry error