Diving into Twitter dataon consumer electronic brands
Which brands get tweeted about most? Is it mainly positive or negative?
15.3 gbof JSON data downloaded from Twitter’s Streaming API 
between 13 –25 May using Python
Before processing, tweets were in raw JSON format 
Time Created 
Tweet text/status 
Username 
Tweet location (if available) 
No. of followers 
No. of people followed 
No. of statuses 
Language 
Data should be optimized as only a fraction of the data used for analysis— optimization improves performance in models and saves cost and time
The same tweet we saw previously 
By optimizing the data, 
15.3 gbof jsonwas converted to 757 mbof csv (5% of original size) 
After processing, only some fields retained and converted to CSV format
Brand 
Positive Sentiment 
Brand 
Negative Sentiment 
Brand 
Mixed Sentiment 
The list of words for sentiment analysis is adapted from 
the Harvard General Inquirer dictionaries 
Source: http://www.wjh.harvard.edu/~inquirer/homecat.htm, downloaded on 28 May 2014 
Tweets are then tagged for brand and sentiment in R
Initially, collected tweets based on 17 keywords 
Samsung 
S4 
Xperia 
HTC 
Huawei 
BlackBerry 
Apple 
S5 
Sony 
Nokia 
Note 3 
Lumia 
q5 
iPhone 
q10 
z10 
Motorala
“Apple” and “iPhone” accounted for 87% of tweet volume 
Removed from keywords during actual data collection to focus on other brands (, save space, and reduce bandwidth usage) 
A trial was conducted with 16 keywords on 11 May, 8 –9am 
1 gbof JSON data was collected in a hour 
During a one hour trial, “Apple” and “iPhone” had 87% share of tweets
Samsung 
Sony 
Nokia 
HTC 
Huawei 
BlackBerry 
Motorola 
Tweets containing seven keywords were collected from 13 –25 May
4% of tweets mentioned > 2 brands; they were excluded from analysis 
8% of tweets had mixed sentiment (i.e., positive and negative sentiment); they were excluded from analysis 
92% of tweets remained, each only mentioning 1 brand with either “positive”, “negative”, or “neutral” sentiment 
3,681,942 tweets were collected 
After processing, 3,234,678 tweets remained for analysis
Samsung leads in twitter buzz, followed by Sony and Nokia 
Together, they make up 75% of twitter buzz 
Samsung is the clear leader in twitter buzz, followed by Sony and Nokia 
However, Samsung and Sony have wider product offerings relative to the rest that mainly focus on phones 
Also, Huawei’s users may mainly be on Weibo, Renren, etc
Most brands have roughly 1:1 ratio of positive to negative tweets 
Samsung is the exception with ratio of roughly 3:2 
Brands have equal ratio of positive to negative tweets
Dip due to connectivity issues 
Brands’ share of tweets is roughly consistent over time
Spikes in tweet volume coincide with product launches
Spikes in tweet volume coincide with product launches
Users who tweet about BlackBerry tend to be better connected (i.e., higher median of followers and people followed)* 
* Excluding outliers 
Across brands, there is not much difference in user connectedness 
The median user has around 250 followers and also follows 250 people
50th–75thpercentile of users who tweet about Sony, HTC, and Motorola have very high numbers of all time tweets (spam bots perhaps?)* 
While Nokia is 3rdin twitter buzz share (14%), users who tweet about Nokia have least numbers of all time tweets 
Suggests that tweets likely to come from real users and not bots (or maybe less active bots) 
* Excluding outliers 
However, there is a large difference between users’ all time tweets
12833979 
followers 
11796709 
followers 
CNN’s tweet on Obama’s BlackBerry was “seen” by most followers
1753696 tweets 
1730006 
tweets 
A bot that retweets on farts has the highest all time tweets
1753696 tweets 
1730006 
tweets 
A bot that retweets on farts has the highest all time tweets
Initially, BlackBerry tweets showed 100% negative sentiment 
Culprit was the word “lack”—it was removed 
However, removing it reduced negative sentiment for other brands by 2 –3 % 
An interesting error led to BlackBerry having 100% negative sentiment
Track brands’ managed twitter accounts and conversations to measure engagement 
Which brands have better engagement with users and why? 
Track general message of tweets 
Are tweets of a brand mainly about sales, reviews, complaints, or news? 
Network analysis to identify users with high centrality and influence 
Which users have high influence and what are they tweeting about my brand? 
Geospatial analysis of tweets 
Are there differences in brand buzz, sentiment, and engagement across regions? 
Where do we go from here?
Code available on GitHub: https://github.com/eugeneyan/Twitter-SMA 
Python script to download tweets in JSON format 
Python scripts to convert tweets from JSON to CSV (with & without regular expressions filtering) 
R script and sentiment analysis list of words 
R script and sentiment analysis list of words to reproduce BlackBerry error

Diving into Twitter data on consumer electronic brands

  • 1.
    Diving into Twitterdataon consumer electronic brands
  • 2.
    Which brands gettweeted about most? Is it mainly positive or negative?
  • 3.
    15.3 gbof JSONdata downloaded from Twitter’s Streaming API between 13 –25 May using Python
  • 4.
    Before processing, tweetswere in raw JSON format Time Created Tweet text/status Username Tweet location (if available) No. of followers No. of people followed No. of statuses Language Data should be optimized as only a fraction of the data used for analysis— optimization improves performance in models and saves cost and time
  • 5.
    The same tweetwe saw previously By optimizing the data, 15.3 gbof jsonwas converted to 757 mbof csv (5% of original size) After processing, only some fields retained and converted to CSV format
  • 6.
    Brand Positive Sentiment Brand Negative Sentiment Brand Mixed Sentiment The list of words for sentiment analysis is adapted from the Harvard General Inquirer dictionaries Source: http://www.wjh.harvard.edu/~inquirer/homecat.htm, downloaded on 28 May 2014 Tweets are then tagged for brand and sentiment in R
  • 7.
    Initially, collected tweetsbased on 17 keywords Samsung S4 Xperia HTC Huawei BlackBerry Apple S5 Sony Nokia Note 3 Lumia q5 iPhone q10 z10 Motorala
  • 8.
    “Apple” and “iPhone”accounted for 87% of tweet volume Removed from keywords during actual data collection to focus on other brands (, save space, and reduce bandwidth usage) A trial was conducted with 16 keywords on 11 May, 8 –9am 1 gbof JSON data was collected in a hour During a one hour trial, “Apple” and “iPhone” had 87% share of tweets
  • 9.
    Samsung Sony Nokia HTC Huawei BlackBerry Motorola Tweets containing seven keywords were collected from 13 –25 May
  • 10.
    4% of tweetsmentioned > 2 brands; they were excluded from analysis 8% of tweets had mixed sentiment (i.e., positive and negative sentiment); they were excluded from analysis 92% of tweets remained, each only mentioning 1 brand with either “positive”, “negative”, or “neutral” sentiment 3,681,942 tweets were collected After processing, 3,234,678 tweets remained for analysis
  • 11.
    Samsung leads intwitter buzz, followed by Sony and Nokia Together, they make up 75% of twitter buzz Samsung is the clear leader in twitter buzz, followed by Sony and Nokia However, Samsung and Sony have wider product offerings relative to the rest that mainly focus on phones Also, Huawei’s users may mainly be on Weibo, Renren, etc
  • 12.
    Most brands haveroughly 1:1 ratio of positive to negative tweets Samsung is the exception with ratio of roughly 3:2 Brands have equal ratio of positive to negative tweets
  • 13.
    Dip due toconnectivity issues Brands’ share of tweets is roughly consistent over time
  • 14.
    Spikes in tweetvolume coincide with product launches
  • 15.
    Spikes in tweetvolume coincide with product launches
  • 16.
    Users who tweetabout BlackBerry tend to be better connected (i.e., higher median of followers and people followed)* * Excluding outliers Across brands, there is not much difference in user connectedness The median user has around 250 followers and also follows 250 people
  • 17.
    50th–75thpercentile of userswho tweet about Sony, HTC, and Motorola have very high numbers of all time tweets (spam bots perhaps?)* While Nokia is 3rdin twitter buzz share (14%), users who tweet about Nokia have least numbers of all time tweets Suggests that tweets likely to come from real users and not bots (or maybe less active bots) * Excluding outliers However, there is a large difference between users’ all time tweets
  • 18.
    12833979 followers 11796709 followers CNN’s tweet on Obama’s BlackBerry was “seen” by most followers
  • 19.
    1753696 tweets 1730006 tweets A bot that retweets on farts has the highest all time tweets
  • 20.
    1753696 tweets 1730006 tweets A bot that retweets on farts has the highest all time tweets
  • 21.
    Initially, BlackBerry tweetsshowed 100% negative sentiment Culprit was the word “lack”—it was removed However, removing it reduced negative sentiment for other brands by 2 –3 % An interesting error led to BlackBerry having 100% negative sentiment
  • 22.
    Track brands’ managedtwitter accounts and conversations to measure engagement Which brands have better engagement with users and why? Track general message of tweets Are tweets of a brand mainly about sales, reviews, complaints, or news? Network analysis to identify users with high centrality and influence Which users have high influence and what are they tweeting about my brand? Geospatial analysis of tweets Are there differences in brand buzz, sentiment, and engagement across regions? Where do we go from here?
  • 23.
    Code available onGitHub: https://github.com/eugeneyan/Twitter-SMA Python script to download tweets in JSON format Python scripts to convert tweets from JSON to CSV (with & without regular expressions filtering) R script and sentiment analysis list of words R script and sentiment analysis list of words to reproduce BlackBerry error