Hamdan Azhar
hamdan@prismoji.com // @hamdanazhar
// November 5, 2016
🐍s,🌹s, & major 🔑s
an introduction to emoji data science
🗃 📊🗃
why emoji data science?
http://theislamicmonthly.com/neither-here-nor-there-on-losing-my-snapchat-best-friend/
emojis data science
Overarching goals
■ Understanding what emojis mean
■ Using emojis to understand the topics we use them to
discuss
■ Getting past the “so what” hurdle and defining good
questions to ask
the birth of
My reaction to this article,
in emoji
So we decided to look at some actual data
Getting the data
■ UseTwitter API to sample 100,000 tweets for five hashtags related to Britain’s EU
Referendum
 Hashtags: #NotMyVote, #VoteRemain, #EURef, #Brexit, #VoteLeave
 Data pulled for June 24, the day after the referendum
 English language tweets only
 After removing retweets, we’re left with 23,989 unique tweets, i.e. the “Brexit
dataset”
 Of these, 1,505 tweets (6.3%) contain at least one emoji
Analyzing the data
 Use regular expressions in R, along with Unicode emoji dictionaries, to
extract emojis from tweets
 Compute emoji counts in the Brexit dataset
 Compare with counts for all >10B emoji tweets onTwitter since 2013 (from
emojitracker.com)
 Extract hashtags from tweets and compute hashtag profiles for various
emojis
emoji emoji name
brexit
rank
general
rank
brexit
index*
general
index*
overindex**
😂 face with tears of joy 1 1 100 100
flag of united kingdom 2 363 87 0.2 400x
👍 thumbs up sign 3 18 26 11 2.3x
👏 clapping hands sign 4 45 24 6 3.9x
❤️ heavy black heart 5 3 21 45
😭 loudly crying face 6 7 17 29
😔 pensive face 7 13 14 18
😩 weary face 8 11 13 22
😢 crying face 9 27 12 9 1.3x
🙈 see-no-evil monkey 10 24 12 9 1.3x
* Index is an estimate of how prevalent a given emoji is in Brexit tweets and general tweets, with the most common emoji (😂) being defined as 100
** Reflects how much more likely a given emoji is to be used in a Brexit tweet vs. generally onTwitter (general rank and index obtained from emojitracker.com). An emoji
overindexes on Brexit if both brexit rank < general rank AND brexit index > general index.
Which emojis over-index most heavily for Brexit?
(above and beyond their usual popularity onTwitter)
Finding the “hashtag signature” of a given emoji
 We know the distribution of
hashtags in our entire dataset
 We can pick a given emoji and
compute the distribution of
hashtags for tweets that use that
emoji
 By comparing these two
distributions, we can estimate
which hashtags an emoji is most
likely to be used with
15%
17%
20%
29%
19%
Hashtag signatures of the top emojis of Brexit
http://motherboard.vice.com/read/the-emojis-of-great-brexit
Taylor Swift is winning hearts (and minds)
Source: Analysis of 100,000
public tweets mentioning
@taylorswift13 and
@kanyewest from
Aug. 1-4, 2016.
(PRISMOJI)
equal
higher association with
@taylorswift13
higher association with
@kanyewest
Hearts vs. Snakes:
The emoji battle underyling the epicTaylor Swift – KanyeWest feud
Source: Analysis of 100,000
public tweets mentioning
@taylorswift13 and
@kanyewest from
Aug. 1-4, 2016.
(PRISMOJI)
#taylorswiftwhatup is the most common hashtag in tweets about
bothTaylor and Kanye
Source: Analysis of 100,000
public tweets mentioning
@taylorswift13 and
@kanyewest from
Aug. 1-4, 2016.
(PRISMOJI)
Our common emoji language of #fanlove
Source: Analysis of 250,000
public tweets mentioning
@beyonce, @justinbieber,
@djkhaled, @drake, and
@rihanna from
Aug. 1-4, 2016.
(PRISMOJI)
Sometimes love hurts
Examples of in tweets involving #fanlove
Source: Analysis of 250,000
public tweets mentioning
@beyonce, @justinbieber,
@djkhaled, @drake, and
@rihanna from
Aug. 1-4, 2016.
(PRISMOJI)
http://motherboard.vice.com/read/a-data-scientists-emoji-guide-to-kanye-west-and-taylor-swift
Some more examples
#firstsevenjobs
Source: Analysis of 32,979 public
tweets with the hashtags
#firstsevenjobs and #first7jobs
from Aug. 8, 2016. (PRISMOJI)
Understanding gendered emojis onTwitter
#wcw vs #mcm: All hearts are not created equal
higher association
with
#mcm
higher association
with
#wcw
Source: Analysis of 100,000
public tweets with the hashtags
#wcw and #mcm fromJune 27-
29, 2016. (PRISMOJI)
#Rio2016 Olympics
Source: Analysis of 449,680
public tweets mentioning
#rio2016 from
Aug. 6-22, 2016.
(PRISMOJI)
higher association with
FIRST 3 DAYS
higher association with
LAST 3 DAYS
Third Presidential Debate
Source: Analysis of public
tweets during third presidential
debate on
Oct. 20, 2016.
(PRISMOJI)
Three takeaways I’d like you to leave with
■ Understanding emojis as data can yield interesting insights
■ More work is needed to learn more about what emojis
mean, and what they reveal about our world
■ You can play around with emoji data too 
Thank you!
• Email: hamdan@prismoji.com
• Twitter: @hamdanazhar
• prismoji.com
• hamdanazhar.com

Introduction to emoji data science (Emojicon, 2016)

  • 1.
    Hamdan Azhar hamdan@prismoji.com //@hamdanazhar // November 5, 2016 🐍s,🌹s, & major 🔑s an introduction to emoji data science 🗃 📊🗃
  • 2.
  • 3.
  • 6.
  • 7.
    Overarching goals ■ Understandingwhat emojis mean ■ Using emojis to understand the topics we use them to discuss ■ Getting past the “so what” hurdle and defining good questions to ask
  • 8.
  • 9.
    My reaction tothis article, in emoji
  • 10.
    So we decidedto look at some actual data
  • 11.
    Getting the data ■UseTwitter API to sample 100,000 tweets for five hashtags related to Britain’s EU Referendum  Hashtags: #NotMyVote, #VoteRemain, #EURef, #Brexit, #VoteLeave  Data pulled for June 24, the day after the referendum  English language tweets only  After removing retweets, we’re left with 23,989 unique tweets, i.e. the “Brexit dataset”  Of these, 1,505 tweets (6.3%) contain at least one emoji
  • 12.
    Analyzing the data Use regular expressions in R, along with Unicode emoji dictionaries, to extract emojis from tweets  Compute emoji counts in the Brexit dataset  Compare with counts for all >10B emoji tweets onTwitter since 2013 (from emojitracker.com)  Extract hashtags from tweets and compute hashtag profiles for various emojis
  • 13.
    emoji emoji name brexit rank general rank brexit index* general index* overindex** 😂face with tears of joy 1 1 100 100 flag of united kingdom 2 363 87 0.2 400x 👍 thumbs up sign 3 18 26 11 2.3x 👏 clapping hands sign 4 45 24 6 3.9x ❤️ heavy black heart 5 3 21 45 😭 loudly crying face 6 7 17 29 😔 pensive face 7 13 14 18 😩 weary face 8 11 13 22 😢 crying face 9 27 12 9 1.3x 🙈 see-no-evil monkey 10 24 12 9 1.3x * Index is an estimate of how prevalent a given emoji is in Brexit tweets and general tweets, with the most common emoji (😂) being defined as 100 ** Reflects how much more likely a given emoji is to be used in a Brexit tweet vs. generally onTwitter (general rank and index obtained from emojitracker.com). An emoji overindexes on Brexit if both brexit rank < general rank AND brexit index > general index. Which emojis over-index most heavily for Brexit? (above and beyond their usual popularity onTwitter)
  • 14.
    Finding the “hashtagsignature” of a given emoji  We know the distribution of hashtags in our entire dataset  We can pick a given emoji and compute the distribution of hashtags for tweets that use that emoji  By comparing these two distributions, we can estimate which hashtags an emoji is most likely to be used with 15% 17% 20% 29% 19%
  • 16.
    Hashtag signatures ofthe top emojis of Brexit
  • 17.
  • 19.
    Taylor Swift iswinning hearts (and minds) Source: Analysis of 100,000 public tweets mentioning @taylorswift13 and @kanyewest from Aug. 1-4, 2016. (PRISMOJI) equal higher association with @taylorswift13 higher association with @kanyewest
  • 20.
    Hearts vs. Snakes: Theemoji battle underyling the epicTaylor Swift – KanyeWest feud Source: Analysis of 100,000 public tweets mentioning @taylorswift13 and @kanyewest from Aug. 1-4, 2016. (PRISMOJI)
  • 21.
    #taylorswiftwhatup is themost common hashtag in tweets about bothTaylor and Kanye Source: Analysis of 100,000 public tweets mentioning @taylorswift13 and @kanyewest from Aug. 1-4, 2016. (PRISMOJI)
  • 23.
    Our common emojilanguage of #fanlove Source: Analysis of 250,000 public tweets mentioning @beyonce, @justinbieber, @djkhaled, @drake, and @rihanna from Aug. 1-4, 2016. (PRISMOJI)
  • 24.
    Sometimes love hurts Examplesof in tweets involving #fanlove Source: Analysis of 250,000 public tweets mentioning @beyonce, @justinbieber, @djkhaled, @drake, and @rihanna from Aug. 1-4, 2016. (PRISMOJI)
  • 25.
  • 26.
  • 27.
    #firstsevenjobs Source: Analysis of32,979 public tweets with the hashtags #firstsevenjobs and #first7jobs from Aug. 8, 2016. (PRISMOJI)
  • 28.
    Understanding gendered emojisonTwitter #wcw vs #mcm: All hearts are not created equal higher association with #mcm higher association with #wcw Source: Analysis of 100,000 public tweets with the hashtags #wcw and #mcm fromJune 27- 29, 2016. (PRISMOJI)
  • 29.
    #Rio2016 Olympics Source: Analysisof 449,680 public tweets mentioning #rio2016 from Aug. 6-22, 2016. (PRISMOJI) higher association with FIRST 3 DAYS higher association with LAST 3 DAYS
  • 30.
    Third Presidential Debate Source:Analysis of public tweets during third presidential debate on Oct. 20, 2016. (PRISMOJI)
  • 31.
    Three takeaways I’dlike you to leave with ■ Understanding emojis as data can yield interesting insights ■ More work is needed to learn more about what emojis mean, and what they reveal about our world ■ You can play around with emoji data too 
  • 32.
    Thank you! • Email:hamdan@prismoji.com • Twitter: @hamdanazhar • prismoji.com • hamdanazhar.com