Adapting Sentiment Lexicons using Contextual Semantics for Sentiment Analysis of Twitter


Published on

Sentiment lexicons for sentiment analysis offer a simple, yet effective way to obtain the prior sentiment information of opinionated words in texts. However, words' sentiment orientations and strengths often change throughout various contexts in which the words appear. In this paper, we propose a lexicon adaptation approach that uses the contextual semantics of words to capture their contexts in tweet messages and update their prior sentiment orientations and/or strengths accordingly. We evaluate our approach on one state-of-the-art sentiment lexicon using three different Twitter datasets. Results show that the sentiment lexicons adapted by our approach outperform the original lexicon in accuracy and F-measure in two datasets, but give similar accuracy and slightly lower F-measure in one dataset.

Published in: Science
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Early work on Sentiment analysis focused mainly on extracting sentiment from conventional text such as movie reviews, blogs, news articles and open forums
    Textual content in these type of media sources is linguistically rich, consists of well structured and formal sentences, and discusses specific topic or domain (e.g., movie reviews)
  • However, with the emergent of social media networks and microblogging platforms, especially Twitter, research interests shifted to analyzing and extracting sentiment from theses new sources.

    Nevertheless, One of the key challenges that Twitter sentiment analysis methods have to confront is the noisy nature of Twitter generated data. Twitter allows only for 140 characters in each post, which influences the use of abbreviations, irregular expressions and infrequent words.

    This phenomena increases the level of data sparsity, affecting the performance of Twitter sentiment classifiers
  • There are several approaches to sentiment analysis.
    One common approach is the lexicon-based approach. This approach assumes that the sentiment orientations of a given
  • Words in the lexicons have fixed prior sentiment orientations, i.e. each term has always the same associated sentiment orientation independently of the context in which the term is used.
  • SentiCircles
  • SentiCircles
  • To build rules we need to look at the characteristics of the sentiment lexicon that we want to adapt.
  • in our work we use thelwall-lexicon as a case study and therefore, we built our adaptation rules base don the characteristics of this lexicon
  • As a case study
  • To build rules we need to look at the characteristics of the sentiment lexicon that we want to adapt.
  • To build rules we need to look at the characteristics of the sentiment lexicon that we want to adapt.
  • Adapting Sentiment Lexicons using Contextual Semantics for Sentiment Analysis of Twitter

    1. 1. Adapting Sentiment Lexicons using Contextual Semantics for Sentiment Analysis of Twitter Hassan Saif, Yulan He, Miriam Fernandez and Harith Alani Knowledge Media Institute, The Open University, Milton Keynes, United Kingdom 1st Workshop on Semantic Sentiment Analysis Greece, Crete 2014
    2. 2. • Sentiment Analysis • Sentiment Analysis Approaches • Sentiment Lexicons on Twitter • Sentiment Lexicon Adaptation Approach • Evaluation • Conclusion Outline
    3. 3. “Sentiment analysis is the task of identifying positive and negative opinions, emotions and evaluations in text” 3 Opinion OpinionFact Sentiment Analysis yes, It is sunny, but also very humid :( The weather is great today :) I think its almost 30 degrees today
    4. 4. I had nightmares all night long last night :( Negative Sentiment Lexicon Text Processing Algorithm Sentiment Analysis The Lexicon-based Approach great sad down wrong horrible love Sentiment Analysis
    5. 5. Sentiment Lexicons - Lists of Opinionated: - Words and Phrases (MPQA, SentiWordNet, etc) - Common Sense Concepts (SenticNet) - Built: - Manually - Dictionary-based Approach - Corpus-based Approach - Applied to Conventional Text - Movie Reviews, News, Blogs, Open Forums, etc.
    6. 6. Sentiment Lexicons on Twitter Twitter Data - Language Variations - New Words - Noisy Nature - lol, gr8, :), :P Traditional Lexicons - Not tailored to Twitter noisy data - Fixed number of words
    7. 7. Twitter-specific Sentiment Lexicons - Such as: Thelwall-Lexicon - Built to specifically work on social data - Contain lists of emoticons, slangs, abbreviations, etc. - Coupled with rule-based method, SentiStrength - Apply text pre-processing routine on tweets
    8. 8. Twitter-specific Sentiment Lexicons Offer Context-Insensitive Prior Sentiment Orientations and Strength of words ..and Traditional Lexicons Great Problem Smile Sentiment Lexicon great sad down wrong horrible love Positive
    9. 9. Lexicons Adaptation Approaches Require Training from Labeled Corpora Supervised Unsupervised Use General Textual Corpora (e.g., WEB) or Static lexical knowledge sources (e.g., WordNet)
    10. 10. Contextual Semantic Adaptation Approach  Unsupervised Approach  Captures the Contextual Semantics of words  To assign Contextual Sentiment
    11. 11. Contextual Semantics of Words “Words that occur in similar context tend to have similar meaning” Wittgenstein (1953) Great Problem Look Smile Concert Song Weather Loss Game Taylor Swift Amazing Great
    12. 12. Capturing Contextual Semantics Term (m) C1 C2 Cn…. Context-Term Vector Degree of Correlation Prior SentimentSentiment Lexicon (1) (2) Great Smile Look SentiCircles Model (3) Contextual Sentiment Strength Contextual Sentiment Orientation Positive, Negative Neutral [-1 (very negative) +1 (very positive)]
    13. 13. Capturing Contextual Semantics Term (m) C1 Degree of Correlation Prior Sentiment Great Smile SentiCircles Model X = R * COS(θ) Y = R * SIN(θ) Smile X ri θi xi yi Great PositiveVery Positive Very Negative Negative +1 -1 +1-1 Neutral Region ri = TDOC(Ci) θi = Prior_Sentiment (Ci) * π
    14. 14. SentiCircles (Example)
    15. 15. Overall Contextual Sentiment Ci X ri θi xi yi m PositiveVery Positive Very Negative Negative +1 -1 +1-1 Neutral Region nwhicheachtermisused. Tocomputethenewsentiment of tiCircleweusetheSenti-Median metric. Wenow havethe hichiscomposedbytheset of (x, y) Cartesiancoordinatesof wherethey valuerepresentsthesentiment andthex value ength. Aneffectiveway toapproximatetheoverall sentiment y calculatingthegeometricmedianof all itspoints. Formally, (p1, p2, ..., pn ) inaSentiCircle⌦, the2Dgeometricmedian g = arg min g2 R2 nX i = 1 k|pi − g||2, (5) Senti-Median of SentiCircle Sentiment Function
    16. 16. Lexicon Adaptation Method • A set of Antecedent-Consequent Rules • Decides on the new sentiment of a term based on: – How Weak/Strong its Prior Sentiment – How Weak/Strong its Contextual Sentiment • Based on the Position of the term’s SentiMedian
    17. 17. Thelwall-Lexicon Case Study fiery -2 fiery -2 vex*-3 fiery -2 witch -1 inspir* 3 fiery* -2 trite* -3 fiery -2 cunt* -4 fiery -2 fiery* -2 intelligent* 2 fiery -2 joll* 3 fiery* -2 fiery* -2 suffers -4 fiery -2 loved 4 insidious* -3 despis* -4 fiery* -2 hehe* 2 398 1919 229 0 500 1000 1500 2000 2500 Positive Negative Neutral • Consists of 2546 terms • Coupled with prior sentiment strength between |1| and |5| [-2, -5] negative term [2, 5] positive term [-1, 1] neutral term
    18. 18. Adaptation Rules on Thelwall-Lexicon Prior Sentiment < -3 (week negative) Revolution Contextual Sentiment = Neutral Change to Neutral Rule 10
    19. 19. Experiments • Sentiment Lexicon – Thelwall-Lexicon • Settings: – Update Setting – Expand Setting – Update + Expand Setting • Datasets • Binary Sentiment Classification – SentiStrength • Lexicon-based Method • Work on Thelwall-Lexicon
    20. 20. Results Adaptation Impact on Thelwall-Lexicon
    21. 21. Results Cross comparison results of the original and the adapted lexicons
    22. 22. Adapted Lexicons on HCR Performance 35 37 39 41 43 45 Precision Recall F1 Positive Sentiment Detection Original Updated Updated+Expanded Sentiment Class Distribution 0.35 0.4 0.45 0.5 0.55 0.6 OMD HCR STS-Gold Positive to Negative Ratio Impact on Thelwall-Lexicon 10 15 20 25 30 OMD HCR STS-Gold New Words Added To Thelwall-Lexicon
    23. 23. Conclusion • We proposed an unsupervised approach for sentiment lexicon adaptation from Twitter data. • It update the words’ prior sentiment orientations and/or strength based on their contextual semantics in tweets • The evaluation was done on Thelwall-Lexicon using three Twitter datasets. • Results showed that lexicons adapted by our approach improved the sentiment classification performance in both accuracy and F1 in two out of three datasets.
    24. 24. Thank You Email: Twitter: hrsaif Website: