SlideShare a Scribd company logo
A Semantics-Based Measure of
Emoji Similarity
Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis)
Wright State University, Dayton, OH, USA
Presented at the 2017 IEEE/WIC/ACM International Conference on Web Intelligence (WI’17)
Leipzig, Germany, 23rd – 26th August, 2017.
Lakshika Balasuriya
lakshika@knoesis.org
Sanjaya Wijeratne
sanjaya@knoesis.org
Derek Doran
derek@knoesis.org
Amit Sheth
amit@knoesis.org
1Appboy Blog – https://www.appboy.com/blog/emojis-used-in-777-more-campaigns/
2Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017.
Businesses extensively use emoji in consumer applications
Dominos App support ordering pizza by texting a pizza emoji
Recent research shows emoji can improve sentiment, emotion
and sarcasm detection (MIT’s DeepMoji)
3
Why Emoji Similarity?
 Measuring the Emoji Similarity can lead to
improvements in numerous tasks including;
 Sentiment Analysis – Emoji features improve sentiment
analysis
 Emoji-based Search – Similar emoji can improve recall
 Designing of optimized emoji keyboards – To provide
intuitive emoji groups to show ~2,700 emoji on small-sized
screens
 Suggesting alternative emoji domains – If an emoji domain
is already taken
Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017.
4
Emoji Similarity Problem
Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017.
 The notion of the similarity of two emoji is very broad
 Pixel-based emoji similarity – Would not work as different platforms use
different emoji pictures to represent the same emoji (Miller et al., 2016)
Can we devise a notion of emoji similarity that
reflects the meaning or intended use of the emoji?
5
Related Research
 EmoTwi50 by Barbieri et al.: Used Word2Vec to learn
distributional semantics of emoji usage to calculate emoji
similarity
 emoji2vec – Eisner et al.: Used emoji keywords (4 words on
average) from Unicode Consortium website and converted them
to emoji vectors using distributional semantics of words learned
via Word2Vec
 Phol et al.
 Used Word2Vec to learn distributional semantics of emoji
 Used Jaccard Similarity on emoji keywords obtained from the
Unicode Consortium (Similar to Wijeratne et al. ICWSM 2017)
Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017.
6
Our Emoji Embedding Approach
Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017.
 Extract emoji definitions from a knowledge base of
emoji senses and definitions (EmojiNet)
 Learn distributional semantics of words as word
embeddings using two corpora (Tweets and Google
News)
 Convert the words in emoji meanings to vectors using
word embeddings (emoji embeddings)
 Evaluate the similarity (distance) of emoji in the
embedding space using EmoSim508, a new dataset
with 508 emoji pairs
7
Building EmojiNet
Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017.
The Unicode Consortium maintains a complete list of the standardized Unicodes for each
emoji along with other information such as manually curated keywords and images.
Emojipedia is a human-created emoji reference website that provides useful emoji
information such as emoji descriptions and short codes.
The Emoji Dictionary is the first crowdsourced emoji reference website that provides
emoji definitions with their sense labels based on how emoji are used in sentences.
BabelNet is the most comprehensive multilingual machine-readable semantic network
available to date that links words into their corresponding Wikipedia pages.
8
EmojiNet
Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017.
Figure 1 – EmojiNet
9
Representing Emoji using Definitions
Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017.
 Emoji Description (Sense_Desc.)
 Gives an overview of what is depicted in an emoji – E.g., “A gun emoji,
more precisely a pistol. A weapon that has potential to cause great harm.
Displayed facing right-to-left on all platforms”
 Emoji Sense Labels (Sense_Label)
 Word-POS tag pairs (such as laugh(noun)). For gun emoji, EmojiNet lists
6 nouns (gun, weapon, pistol, violence, revolver, handgun), 3 verbs
(shoot, gun, pistol) and 3 adjectives (deadly, violent, deathly)
 Emoji Sense Definitions (Sense_Def.)
 Textual descriptions that explain each sense label and how those sense
labels should be used in a sentence
 Combination of all of the above (Sense_All)
10
Learning Emoji Embeddings
Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017.
Figure 2 – Learning Emoji Embedding Models using Word Vectors
11
Ground Truth Data Creation
Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017.
 110M Tweets were used to identify 508 most frequently co-
occurred emoji pairs, which covered 25% of the dataset. The full
emoji list is at –http://emojinet.knoesis.org/emojipairs508.htm
Figure 3 – Emoji Co-Occurrence Frequency Graph
12
Ground Truth Data Creation Cont.
Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017.
 Given an emoji pair, we asked 10 human annotators to
evaluate them for equivalence (Q1) and relatedness (Q2)
 Q1 – Can the use of one emoji be replaced by the other?
 Q2 – Can one use either emoji in the same context?
 Lickert scale from 0 to 4 was used to rate equivalence and
relatedness
 We averaged the equivalence and relatedness values for each
emoji to calculate the final similarity value for the two emoji –
http://emojinet.knoesis.org/emojipairs508_userstudy.htm
 Krippendor’s alpha coefficient was used to calculate the inter
annotator agreement (α for Q1 = 0.632 and α for Q2 = 0.567)
13
Intrinsic Evaluation
Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017.
 Using four different emoji definitions (Sense_Desc.,
Sense_Label, Sense_Def., Sense_All) and two corpora
(Twitter and Google News), we trained eight emoji
embedding models for each emoji
 We calculated emoji similarity of the 508 emoji pairs
using each embedding model
 Using Spearman’s Rank Correlation Coefficient
(Spearman’s ρ), we compared the similarity rankings of
each model with ground truth data
14
Intrinsic Evaluation Cont.
Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017.
 Except for Sense_Desc.-based embedding models which
correlated moderately with ground truth data (40.0 < ρ < 59.0), all
other models show a strong correlation (60.0 < ρ < 79.0)
 Sense_Labels-based embedding models correlate best with
ground truth data
Table 1 – Spearman’s Rank Correlation Results
15
Extrinsic Evaluation
Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017.
 We tested our emoji embedding models using a
sentiment analysis baseline
 Our baseline had 12,920 English tweets, and 2,295 of them
had emoji
 All words in the tweets were replaced with their
corresponding word embeddings and emoji were replaced
with emoji embeddings learned
 Emoji embeddings based on Sense-labels outperformed
previous best performing model
16
Extrinsic Evaluation Cont.
Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017.
Table 2 – Accuracy of the Sentiment Analysis task using Emoji Embeddings
17
Key Takeaways
Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017.
 We created a dataset that can be used to evaluate emoji
similarity tasks – Download link http://emojinet.knoesis.org/
 Combining emoji sense knowledge with distributional semantics
could improve the emoji embedding models
 Longer sense definitions are not suitable to learn emoji
embeddings
 Longer definitions contain words that do not directly contribute to the
meaning of an emoji, thus, add noise to the learned embedding models.
 Sense labels, which are directly related to the meaning of emoji tend to
result in better embedding models
18
Future Work
Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017.
 Extend emoji embeddings to capture the differences in
emoji interpretations due to how they appear across
different platforms or devices
 Use emoji embedding models in other downstream
applications such as emoji-based search
19
Acknowledgement
We are thankful to the annotators who helped us in creating the EmoSim508 dataset.
We acknowledge partial support from the National Institute on Drug Abuse (NIDA)
Grant No. 5R01DA039454- 03: “Trending: Social Media Analysis to Monitor Cannabis
and Synthetic Cannabinoid Use”, and the National Science Foundation (NSF) award:
CNS-1513721: “Context-Aware Harassment Detection on Social Media”. Points of view
or opinions in this document are those of the authors and do not necessarily represent
the official position or policies of the NIDA or NSF.
Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017.
20
Related Publications
 Sanjaya Wijeratne et al. A Semantics-Based Measure of Emoji
Similarity (Web Intelligence 2017)
 Sanjaya Wijeratne et al. EmojiNet: An Open Service and API for
Emoji Sense Discovery (ICWSM 2017)
 Sanjaya Wijeratne et al. EmojiNet: Building a Machine Readable
Sense Inventory for Emoji (SocInfo 2016)
Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017.
21SML @ IJCAI 2016 Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identificationamit@knoesis.org http://knoesis.org/amit
Thank You!
Visit us at http://emojinet.knoesis.org/

More Related Content

Similar to A Semantics-Based Measure of Emoji Similarity

Noa Ha'aman - 2017 - MojiSem: Varying Linguistic Purposes of Emoji in (Twitte...
Noa Ha'aman - 2017 - MojiSem: Varying Linguistic Purposes of Emoji in (Twitte...Noa Ha'aman - 2017 - MojiSem: Varying Linguistic Purposes of Emoji in (Twitte...
Noa Ha'aman - 2017 - MojiSem: Varying Linguistic Purposes of Emoji in (Twitte...
Association for Computational Linguistics
 
A Context-Based Algorithm For Sentiment Analysis
A Context-Based Algorithm For Sentiment AnalysisA Context-Based Algorithm For Sentiment Analysis
A Context-Based Algorithm For Sentiment Analysis
Richard Hogue
 
Extraction of Emoticons with Sentimental Bar
Extraction of Emoticons with Sentimental BarExtraction of Emoticons with Sentimental Bar
Extraction of Emoticons with Sentimental Bar
vivatechijri
 
THE ANALYSIS FOR CUSTOMER REVIEWS THROUGH TWEETS, BASED ON DEEP LEARNING
THE ANALYSIS FOR CUSTOMER REVIEWS THROUGH TWEETS, BASED ON DEEP LEARNINGTHE ANALYSIS FOR CUSTOMER REVIEWS THROUGH TWEETS, BASED ON DEEP LEARNING
THE ANALYSIS FOR CUSTOMER REVIEWS THROUGH TWEETS, BASED ON DEEP LEARNING
IRJET Journal
 
IRJET- Dynamic Emotion Recognition and Emoji Generation
IRJET- Dynamic Emotion Recognition and Emoji GenerationIRJET- Dynamic Emotion Recognition and Emoji Generation
IRJET- Dynamic Emotion Recognition and Emoji Generation
IRJET Journal
 
Deriving Conversational Insight by Learning Emoji Representation by Jeff Wein...
Deriving Conversational Insight by Learning Emoji Representation by Jeff Wein...Deriving Conversational Insight by Learning Emoji Representation by Jeff Wein...
Deriving Conversational Insight by Learning Emoji Representation by Jeff Wein...
Data Con LA
 
Twitter sentimentanalysis report
Twitter sentimentanalysis reportTwitter sentimentanalysis report
Twitter sentimentanalysis report
Savio Aberneithie
 
IRJET- Sentimental Analysis on Audio and Video using Vader Algorithm -Monali ...
IRJET- Sentimental Analysis on Audio and Video using Vader Algorithm -Monali ...IRJET- Sentimental Analysis on Audio and Video using Vader Algorithm -Monali ...
IRJET- Sentimental Analysis on Audio and Video using Vader Algorithm -Monali ...
IRJET Journal
 
12101-56982-3-PB.pdf
12101-56982-3-PB.pdf12101-56982-3-PB.pdf
12101-56982-3-PB.pdf
SyauqiRahmat1
 
N01741100102
N01741100102N01741100102
N01741100102
IOSR Journals
 
Sentimental analysis of audio based customer reviews without textual conversion
Sentimental analysis of audio based customer reviews without textual conversionSentimental analysis of audio based customer reviews without textual conversion
Sentimental analysis of audio based customer reviews without textual conversion
IJECEIAES
 
Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis
Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment AnalysisClassification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis
Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis
SHAILENDRA KUMAR SINGH
 
Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...
Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...
Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...
Sara Alvarez
 
Issues in Sentiment analysis
Issues in Sentiment analysisIssues in Sentiment analysis
Issues in Sentiment analysis
IOSR Journals
 
Itamoji: Italian Emoji Prediction @ Evalita 2018
Itamoji: Italian Emoji Prediction @ Evalita 2018Itamoji: Italian Emoji Prediction @ Evalita 2018
Itamoji: Italian Emoji Prediction @ Evalita 2018
University of Torino
 
Type and Category of Memes used on social media
Type and Category of Memes used on social media Type and Category of Memes used on social media
Type and Category of Memes used on social media
HennaAnsari
 
IRJET-Sentiment Analysis in Twitter
IRJET-Sentiment Analysis in TwitterIRJET-Sentiment Analysis in Twitter
IRJET-Sentiment Analysis in Twitter
IRJET Journal
 
The Identification of Depressive Moods from Twitter Data by Using Convolution...
The Identification of Depressive Moods from Twitter Data by Using Convolution...The Identification of Depressive Moods from Twitter Data by Using Convolution...
The Identification of Depressive Moods from Twitter Data by Using Convolution...
IRJET Journal
 
Evaluating sentiment analysis and word embedding techniques on Brexit
Evaluating sentiment analysis and word embedding techniques on BrexitEvaluating sentiment analysis and word embedding techniques on Brexit
Evaluating sentiment analysis and word embedding techniques on Brexit
IAESIJAI
 
IRJET- Review on Mood Detection using Image Processing and Chatbot using Arti...
IRJET- Review on Mood Detection using Image Processing and Chatbot using Arti...IRJET- Review on Mood Detection using Image Processing and Chatbot using Arti...
IRJET- Review on Mood Detection using Image Processing and Chatbot using Arti...
IRJET Journal
 

Similar to A Semantics-Based Measure of Emoji Similarity (20)

Noa Ha'aman - 2017 - MojiSem: Varying Linguistic Purposes of Emoji in (Twitte...
Noa Ha'aman - 2017 - MojiSem: Varying Linguistic Purposes of Emoji in (Twitte...Noa Ha'aman - 2017 - MojiSem: Varying Linguistic Purposes of Emoji in (Twitte...
Noa Ha'aman - 2017 - MojiSem: Varying Linguistic Purposes of Emoji in (Twitte...
 
A Context-Based Algorithm For Sentiment Analysis
A Context-Based Algorithm For Sentiment AnalysisA Context-Based Algorithm For Sentiment Analysis
A Context-Based Algorithm For Sentiment Analysis
 
Extraction of Emoticons with Sentimental Bar
Extraction of Emoticons with Sentimental BarExtraction of Emoticons with Sentimental Bar
Extraction of Emoticons with Sentimental Bar
 
THE ANALYSIS FOR CUSTOMER REVIEWS THROUGH TWEETS, BASED ON DEEP LEARNING
THE ANALYSIS FOR CUSTOMER REVIEWS THROUGH TWEETS, BASED ON DEEP LEARNINGTHE ANALYSIS FOR CUSTOMER REVIEWS THROUGH TWEETS, BASED ON DEEP LEARNING
THE ANALYSIS FOR CUSTOMER REVIEWS THROUGH TWEETS, BASED ON DEEP LEARNING
 
IRJET- Dynamic Emotion Recognition and Emoji Generation
IRJET- Dynamic Emotion Recognition and Emoji GenerationIRJET- Dynamic Emotion Recognition and Emoji Generation
IRJET- Dynamic Emotion Recognition and Emoji Generation
 
Deriving Conversational Insight by Learning Emoji Representation by Jeff Wein...
Deriving Conversational Insight by Learning Emoji Representation by Jeff Wein...Deriving Conversational Insight by Learning Emoji Representation by Jeff Wein...
Deriving Conversational Insight by Learning Emoji Representation by Jeff Wein...
 
Twitter sentimentanalysis report
Twitter sentimentanalysis reportTwitter sentimentanalysis report
Twitter sentimentanalysis report
 
IRJET- Sentimental Analysis on Audio and Video using Vader Algorithm -Monali ...
IRJET- Sentimental Analysis on Audio and Video using Vader Algorithm -Monali ...IRJET- Sentimental Analysis on Audio and Video using Vader Algorithm -Monali ...
IRJET- Sentimental Analysis on Audio and Video using Vader Algorithm -Monali ...
 
12101-56982-3-PB.pdf
12101-56982-3-PB.pdf12101-56982-3-PB.pdf
12101-56982-3-PB.pdf
 
N01741100102
N01741100102N01741100102
N01741100102
 
Sentimental analysis of audio based customer reviews without textual conversion
Sentimental analysis of audio based customer reviews without textual conversionSentimental analysis of audio based customer reviews without textual conversion
Sentimental analysis of audio based customer reviews without textual conversion
 
Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis
Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment AnalysisClassification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis
Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis
 
Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...
Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...
Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...
 
Issues in Sentiment analysis
Issues in Sentiment analysisIssues in Sentiment analysis
Issues in Sentiment analysis
 
Itamoji: Italian Emoji Prediction @ Evalita 2018
Itamoji: Italian Emoji Prediction @ Evalita 2018Itamoji: Italian Emoji Prediction @ Evalita 2018
Itamoji: Italian Emoji Prediction @ Evalita 2018
 
Type and Category of Memes used on social media
Type and Category of Memes used on social media Type and Category of Memes used on social media
Type and Category of Memes used on social media
 
IRJET-Sentiment Analysis in Twitter
IRJET-Sentiment Analysis in TwitterIRJET-Sentiment Analysis in Twitter
IRJET-Sentiment Analysis in Twitter
 
The Identification of Depressive Moods from Twitter Data by Using Convolution...
The Identification of Depressive Moods from Twitter Data by Using Convolution...The Identification of Depressive Moods from Twitter Data by Using Convolution...
The Identification of Depressive Moods from Twitter Data by Using Convolution...
 
Evaluating sentiment analysis and word embedding techniques on Brexit
Evaluating sentiment analysis and word embedding techniques on BrexitEvaluating sentiment analysis and word embedding techniques on Brexit
Evaluating sentiment analysis and word embedding techniques on Brexit
 
IRJET- Review on Mood Detection using Image Processing and Chatbot using Arti...
IRJET- Review on Mood Detection using Image Processing and Chatbot using Arti...IRJET- Review on Mood Detection using Image Processing and Chatbot using Arti...
IRJET- Review on Mood Detection using Image Processing and Chatbot using Arti...
 

Recently uploaded

Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
timhan337
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
CarlosHernanMontoyab2
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Po-Chuan Chen
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 

Recently uploaded (20)

Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 

A Semantics-Based Measure of Emoji Similarity

  • 1. A Semantics-Based Measure of Emoji Similarity Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) Wright State University, Dayton, OH, USA Presented at the 2017 IEEE/WIC/ACM International Conference on Web Intelligence (WI’17) Leipzig, Germany, 23rd – 26th August, 2017. Lakshika Balasuriya lakshika@knoesis.org Sanjaya Wijeratne sanjaya@knoesis.org Derek Doran derek@knoesis.org Amit Sheth amit@knoesis.org
  • 2. 1Appboy Blog – https://www.appboy.com/blog/emojis-used-in-777-more-campaigns/ 2Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017. Businesses extensively use emoji in consumer applications Dominos App support ordering pizza by texting a pizza emoji Recent research shows emoji can improve sentiment, emotion and sarcasm detection (MIT’s DeepMoji)
  • 3. 3 Why Emoji Similarity?  Measuring the Emoji Similarity can lead to improvements in numerous tasks including;  Sentiment Analysis – Emoji features improve sentiment analysis  Emoji-based Search – Similar emoji can improve recall  Designing of optimized emoji keyboards – To provide intuitive emoji groups to show ~2,700 emoji on small-sized screens  Suggesting alternative emoji domains – If an emoji domain is already taken Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017.
  • 4. 4 Emoji Similarity Problem Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017.  The notion of the similarity of two emoji is very broad  Pixel-based emoji similarity – Would not work as different platforms use different emoji pictures to represent the same emoji (Miller et al., 2016) Can we devise a notion of emoji similarity that reflects the meaning or intended use of the emoji?
  • 5. 5 Related Research  EmoTwi50 by Barbieri et al.: Used Word2Vec to learn distributional semantics of emoji usage to calculate emoji similarity  emoji2vec – Eisner et al.: Used emoji keywords (4 words on average) from Unicode Consortium website and converted them to emoji vectors using distributional semantics of words learned via Word2Vec  Phol et al.  Used Word2Vec to learn distributional semantics of emoji  Used Jaccard Similarity on emoji keywords obtained from the Unicode Consortium (Similar to Wijeratne et al. ICWSM 2017) Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017.
  • 6. 6 Our Emoji Embedding Approach Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017.  Extract emoji definitions from a knowledge base of emoji senses and definitions (EmojiNet)  Learn distributional semantics of words as word embeddings using two corpora (Tweets and Google News)  Convert the words in emoji meanings to vectors using word embeddings (emoji embeddings)  Evaluate the similarity (distance) of emoji in the embedding space using EmoSim508, a new dataset with 508 emoji pairs
  • 7. 7 Building EmojiNet Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017. The Unicode Consortium maintains a complete list of the standardized Unicodes for each emoji along with other information such as manually curated keywords and images. Emojipedia is a human-created emoji reference website that provides useful emoji information such as emoji descriptions and short codes. The Emoji Dictionary is the first crowdsourced emoji reference website that provides emoji definitions with their sense labels based on how emoji are used in sentences. BabelNet is the most comprehensive multilingual machine-readable semantic network available to date that links words into their corresponding Wikipedia pages.
  • 8. 8 EmojiNet Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017. Figure 1 – EmojiNet
  • 9. 9 Representing Emoji using Definitions Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017.  Emoji Description (Sense_Desc.)  Gives an overview of what is depicted in an emoji – E.g., “A gun emoji, more precisely a pistol. A weapon that has potential to cause great harm. Displayed facing right-to-left on all platforms”  Emoji Sense Labels (Sense_Label)  Word-POS tag pairs (such as laugh(noun)). For gun emoji, EmojiNet lists 6 nouns (gun, weapon, pistol, violence, revolver, handgun), 3 verbs (shoot, gun, pistol) and 3 adjectives (deadly, violent, deathly)  Emoji Sense Definitions (Sense_Def.)  Textual descriptions that explain each sense label and how those sense labels should be used in a sentence  Combination of all of the above (Sense_All)
  • 10. 10 Learning Emoji Embeddings Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017. Figure 2 – Learning Emoji Embedding Models using Word Vectors
  • 11. 11 Ground Truth Data Creation Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017.  110M Tweets were used to identify 508 most frequently co- occurred emoji pairs, which covered 25% of the dataset. The full emoji list is at –http://emojinet.knoesis.org/emojipairs508.htm Figure 3 – Emoji Co-Occurrence Frequency Graph
  • 12. 12 Ground Truth Data Creation Cont. Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017.  Given an emoji pair, we asked 10 human annotators to evaluate them for equivalence (Q1) and relatedness (Q2)  Q1 – Can the use of one emoji be replaced by the other?  Q2 – Can one use either emoji in the same context?  Lickert scale from 0 to 4 was used to rate equivalence and relatedness  We averaged the equivalence and relatedness values for each emoji to calculate the final similarity value for the two emoji – http://emojinet.knoesis.org/emojipairs508_userstudy.htm  Krippendor’s alpha coefficient was used to calculate the inter annotator agreement (α for Q1 = 0.632 and α for Q2 = 0.567)
  • 13. 13 Intrinsic Evaluation Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017.  Using four different emoji definitions (Sense_Desc., Sense_Label, Sense_Def., Sense_All) and two corpora (Twitter and Google News), we trained eight emoji embedding models for each emoji  We calculated emoji similarity of the 508 emoji pairs using each embedding model  Using Spearman’s Rank Correlation Coefficient (Spearman’s ρ), we compared the similarity rankings of each model with ground truth data
  • 14. 14 Intrinsic Evaluation Cont. Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017.  Except for Sense_Desc.-based embedding models which correlated moderately with ground truth data (40.0 < ρ < 59.0), all other models show a strong correlation (60.0 < ρ < 79.0)  Sense_Labels-based embedding models correlate best with ground truth data Table 1 – Spearman’s Rank Correlation Results
  • 15. 15 Extrinsic Evaluation Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017.  We tested our emoji embedding models using a sentiment analysis baseline  Our baseline had 12,920 English tweets, and 2,295 of them had emoji  All words in the tweets were replaced with their corresponding word embeddings and emoji were replaced with emoji embeddings learned  Emoji embeddings based on Sense-labels outperformed previous best performing model
  • 16. 16 Extrinsic Evaluation Cont. Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017. Table 2 – Accuracy of the Sentiment Analysis task using Emoji Embeddings
  • 17. 17 Key Takeaways Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017.  We created a dataset that can be used to evaluate emoji similarity tasks – Download link http://emojinet.knoesis.org/  Combining emoji sense knowledge with distributional semantics could improve the emoji embedding models  Longer sense definitions are not suitable to learn emoji embeddings  Longer definitions contain words that do not directly contribute to the meaning of an emoji, thus, add noise to the learned embedding models.  Sense labels, which are directly related to the meaning of emoji tend to result in better embedding models
  • 18. 18 Future Work Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017.  Extend emoji embeddings to capture the differences in emoji interpretations due to how they appear across different platforms or devices  Use emoji embedding models in other downstream applications such as emoji-based search
  • 19. 19 Acknowledgement We are thankful to the annotators who helped us in creating the EmoSim508 dataset. We acknowledge partial support from the National Institute on Drug Abuse (NIDA) Grant No. 5R01DA039454- 03: “Trending: Social Media Analysis to Monitor Cannabis and Synthetic Cannabinoid Use”, and the National Science Foundation (NSF) award: CNS-1513721: “Context-Aware Harassment Detection on Social Media”. Points of view or opinions in this document are those of the authors and do not necessarily represent the official position or policies of the NIDA or NSF. Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017.
  • 20. 20 Related Publications  Sanjaya Wijeratne et al. A Semantics-Based Measure of Emoji Similarity (Web Intelligence 2017)  Sanjaya Wijeratne et al. EmojiNet: An Open Service and API for Emoji Sense Discovery (ICWSM 2017)  Sanjaya Wijeratne et al. EmojiNet: Building a Machine Readable Sense Inventory for Emoji (SocInfo 2016) Wijeratne, Sanjaya et al. A Semantics-Based Measure of Emoji Similarity, Web Intelligence 2017.
  • 21. 21SML @ IJCAI 2016 Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identificationamit@knoesis.org http://knoesis.org/amit Thank You! Visit us at http://emojinet.knoesis.org/

Editor's Notes

  1. Sentiment Analysis – It has been shown that using emoji as features can improve sentiment analysis, so similar emoji can further strengthen sentiment analysis results. Emoji-based Search – Similar words can improve the recall of information retrieval and search. Similarly, similar emoji can help improving the recall for emoji-based search. Optimized Keyboard Design – Currently, there are close to 2,700 emoji supported by the Unicode Consortium and none of the emoji keyboards can show the full list of emoji in small display screens. Emoji similarity can be used to group emoji based on their similarity, which could lead to optimized emoji keyboard designs. Emoji domain name suggestions – Emoji domain names are getting popular now and emoji similarity can be used to suggest alternate emoji domain names if a particular domain name is already taken.
  2. We are interested in measuring the semantic similarity of emoji such that the measure reflects the likeness of their meaning, interpretation or intended use.
  3. EmoTwi50 by Barbieri et al. – They used a collection of tweets to train a word2vec model which they then used to learn the distributional semantics of emoji. emoji2vec – Eisner et al. – They represented an emoji using the keywords listed in the Unicode Consortium Webpage. Then they replaced each keyword listed for emoji using their corresponding word embeddings learned by the Google News corpus and summed the word embeddings for all keywords belong to a given emoji. They used the resulting embedding model as the emoji embedding for a given emoji. Phol et al. – They learned distributional semantics of emoji the same way Barbieri et al. learned using Word2Vec. They also calculated Jaccard Similarity of two emoji using the overlap between the emoji keywords listed in the Unicode Consortium website. This is similar to our previous work which we published in ICWSM 2017 (Link - http://knoesis.org/people/sanjayaw/papers/2017/ICWSM_2017_EmojiNet_Final_Wijeratne.pdf).
  4. Our approach is based on representing emoji definitions available in a machine-readable emoji sense inventory using distributional semantic models to learned over a large corpus of text. This is the first effort that uses a machine-readable sense inventory with distributional semantics models.
  5. We use Word2Vec model to first learn word vectors using a large corpus of text. Then for each word in each emoji definition, we replace the word with its corresponding word vector. Then we take the vector average of all vectors that belongs to words in an emoji difinition.
  6. We didn’t want to hand pick emoji pairs as past studies have done. At the same time, we didn’t want to randomly select emoji pairs too, as it is shown that the random emoji pairs are not meaningful in majority of the cases. Thus, we wanted to come up with a way to get a meaningful set of emoji pairs without hand picking them. To do that, we followed the below approach. We first collected 147 million tweets by using emoji as query terms. Then we removed retweets, which gave us a sample of 110 million unique tweets. Then, we created a graph where we plot the most frequently co-occurring emoji pairs against the co-occurring frequency. We selected the 508 most co-occurring emoji pairs as those 508 emoji pairs alone covers 25% of the whole dataset (110 million).