Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Semantic Patterns for Sentiment 
Analysis of Twitter 
Hassan Saif, Yulan He, Miriam Fernandez and Harith Alani 
The 13th I...
OutLine 
o Sentiment Analysis 
o Traditional Sentiment Analysis 
o Pattern-based Sentiment Analysis 
o Semantic Sentiment ...
Sentiment Analysis 
“Sentiment analysis is the task of identifying 
positive and negative opinions, emotions and 
evaluati...
Traditional Sentiment Analysis 
Training Features: 
– Syntactic features 
(letter, n-grams, 
word n-grams, POS 
tags, etc)...
Traditional Sentiment Analysis 
However.. 
Sentiment is often expressed via more subtle relations, 
patterns and dependenc...
Pattern-based Sentiment Analysis 
Syntactic Pattern Approaches 
Semantic Pattern Approaches
Syntactic Pattern Approaches 
• Based on syntactic relations between words. 
• Rely on predefined POS templates: 
<subject...
Semantic Pattern Approaches 
• Apply syntactic and semantic processing techniques 
• Use external semantic resources (Onto...
Syntactic & Semantic Pattern Approaches 
are not tailored to 
Twitter
Syntactic & Semantic Pattern 
Approaches 
Are designed to function on 
Formal Text, that is: 
1. Long enough 
2. Well-Stru...
Tweets are often 
• Short! 
• Noisy and messy 
• Have informal, and 
ill-structured sentences
We Propose.. 
 A pattern-based approach 
 Works on Twitter 
 Does not rely on the syntactic structures of tweets or pre...
Contextual Semantics and Sentiment 
Contextual Semantics 
• Contextual Semantics refer to semantics inferred 
from words’ ...
Contextual Semantic Sentiment Patterns 
“Some words in different tweets tend to come with similar contextual semantics 
an...
Contextual Semantic Sentiment Patterns 
Threat 
Trojan Horse 
Hack 
Code 
Dangerous 
Spyware 
Program 
Harm 
Malware 
C_Se...
Pattern Extraction 
Tweets 
Sentiment Lexicon 
Capturing Contextual 
Semantics & Sentiment 
Syntactical Preprocessing 
Ext...
(1) Syntactical Preprocessing 
• All URL links are replaced with the term “URL” 
• Remove all non-ASCII and non-English ch...
(2) Capturing Contextual Semantics & Sentiment 
The SentiCircle Approach 
Context Terms 
Term (m) C1 
Trojan Horse 
Prior ...
(3) Extracting Semantic Sentiment Patterns 
Patterns are extracted by finding clusters of 
Similar SentiCircles 
iPod 
Spy...
Evaluation 
SS-Patterns 
Training 
Sentiment 
Classifiers 
Entity-level Sentiment Analysis 
Detect the sentiment (Positive...
Evaluation Setup (1) 
Sentiment Classifiers 
– Tweet-Level 
• Maximum Entropy (MaxEnt) 
• Naïve Bayes (NB) 
– Entity-Level...
Datasets 
Evaluation Setup (2) 
Tweet-level 
9 Twitter datasets 
Entity-Level 
58 manually 
annotated named 
entities
Evaluation Setup (3) 
Baseline Features 
Syntactic Features 
Unigrams Individual unique terms in tweets 
POS Features Word...
Results
Tweet-Level Sentiment Analysis (1) 
The baseline model is a sentiment classifier trained 
from word unigram features. 
• M...
Tweet-Level Sentiment Analysis (2) 
Win/Loss in Accuracy and F-measure of using different features for sentiment 
classifi...
Entity-Level Sentiment Analysis 
SS-Patterns produce 6.31% and 7.5% higher accuracy and F-measure than other features 
67....
Within-Pattern Sentiment Consistency 
• Refers to the percentage of words having 
similar sentiment within a given pattern...
Within-Pattern Sentiment Consistency 
• STS-Entity Dataset: 
– 58 Entities 14 SS-Patterns 
Consistency(Pattern12) = 88.89%...
Conclusion 
• We proposed a new approach for automatically extracting patterns 
from the contextual semantic and sentiment...
Thank You 
Email: hassan.saif@open.ac.uk 
Twitter: hrsaif 
Website: tweenator.com
Upcoming SlideShare
Loading in …5
×

Semantic Patterns for Sentiment Analysis of Twitter

5,732 views

Published on

Most existing approaches to Twitter sentiment analysis assume that sentiment is explicitly expressed through affective words. Nevertheless, sentiment is often implicitly expressed via latent semantic relations, patterns and dependencies among words in tweets. In this paper, we propose a novel approach that automatically captures patterns of words of similar contextual semantics and sentiment in tweets. Unlike previous work on sentiment pattern extraction, our proposed approach does not rely on external and fixed sets of syntactical templates/patterns, nor requires deep analyses of the syntactic structure of sentences in tweets.
We evaluate our approach with tweet- and entity-level sentiment analysis tasks by using the extracted semantic patterns as classification features in both tasks. We use 9 Twitter datasets in our evaluation and compare the performance of our patterns against 6 state-of-the-art baselines. Results show that our patterns consistently outperform all other baselines on all datasets by 2.19% at the tweet-level and 7.5% at the entity-level in average F-measure.

Published in: Social Media

Semantic Patterns for Sentiment Analysis of Twitter

  1. 1. Semantic Patterns for Sentiment Analysis of Twitter Hassan Saif, Yulan He, Miriam Fernandez and Harith Alani The 13th International Semantic Web Conference (ISWC2014) May 2014
  2. 2. OutLine o Sentiment Analysis o Traditional Sentiment Analysis o Pattern-based Sentiment Analysis o Semantic Sentiment Patterns o Evaluation o Results o Conclusion
  3. 3. Sentiment Analysis “Sentiment analysis is the task of identifying positive and negative opinions, emotions and evaluations in text” 3 Nooo, it is very humid :( The weather is great today :) I think its almost 30 degrees today Opinion Fact Opinion
  4. 4. Traditional Sentiment Analysis Training Features: – Syntactic features (letter, n-grams, word n-grams, POS tags, etc) – Linguistic Features (Synonyms, glosses, etc) (1) The Lexicon-based Approach (1) The Machine Learning Approach Just got my new iPhone 6, looks and feel great! :D Sentiment Lexicon great sad down wrong
  5. 5. Traditional Sentiment Analysis However.. Sentiment is often expressed via more subtle relations, patterns and dependencies among words in tweets: Destroy Invading Germs Negative Negative Concept Positive Sentiment
  6. 6. Pattern-based Sentiment Analysis Syntactic Pattern Approaches Semantic Pattern Approaches
  7. 7. Syntactic Pattern Approaches • Based on syntactic relations between words. • Rely on predefined POS templates: <subject> passive-verb <subject> active-verb <customer> was satisfied <she> complained • But, they are Semantically Weak! <beer> is cold <subject> verb cold <weather> is cold
  8. 8. Semantic Pattern Approaches • Apply syntactic and semantic processing techniques • Use external semantic resources (Ontologies, Semantic Networks, etc.) • Capture the conceptual semantic relations in text that implicitly convey sentiment – Happy birthday (Positive) – Invading Germs (Negative)
  9. 9. Syntactic & Semantic Pattern Approaches are not tailored to Twitter
  10. 10. Syntactic & Semantic Pattern Approaches Are designed to function on Formal Text, that is: 1. Long enough 2. Well-Structured 3. Formal Sentences
  11. 11. Tweets are often • Short! • Noisy and messy • Have informal, and ill-structured sentences
  12. 12. We Propose..  A pattern-based approach  Works on Twitter  Does not rely on the syntactic structures of tweets or pre-defined syntactic templates  Does not rely on or semantic knowledge sources.  Automatically extracts patterns from the contextual semantic and sentiment similarities of words in tweets
  13. 13. Contextual Semantics and Sentiment Contextual Semantics • Contextual Semantics refer to semantics inferred from words’ co-occurrences in tweets. “Words that occur in similar context tend to have similar meaning” Wittgenstein (1953) Threat Hack Trojan Horse Dangerous Code Program Harm Malware Greek Tale Trojan Horse History Troy Wooden Class
  14. 14. Contextual Semantic Sentiment Patterns “Some words in different tweets tend to come with similar contextual semantics and sentiment, forming therefore specific clusters or patterns. Threat Trojan Horse Hack Code Dangerous Spyware Program Harm Malware
  15. 15. Contextual Semantic Sentiment Patterns Threat Trojan Horse Hack Code Dangerous Spyware Program Harm Malware C_Semantics(Worms) Negative Contextual Pattern C_Semantics(Adware) C_Semantics(Time bombs) Follow Follow Follow
  16. 16. Pattern Extraction Tweets Sentiment Lexicon Capturing Contextual Semantics & Sentiment Syntactical Preprocessing Extracting Semantic Sentiment Patterns Bag of SentiCircles Bag of SS-Patterns 1. Syntactical Preprocessing of tweets 2. Capturing the Contextual Semantics and Sentiment of words 3. Extracting Semantic Sentiment Patterns Pipeline
  17. 17. (1) Syntactical Preprocessing • All URL links are replaced with the term “URL” • Remove all non-ASCII and non-English characters • Revert words that contain repeated letters to their original English form. – “maaadddd” will be converted to “mad” after processing.
  18. 18. (2) Capturing Contextual Semantics & Sentiment The SentiCircle Approach Context Terms Term (m) C1 Trojan Horse Prior Sentiment DanCg1erous fix Degree of Correlation X = R * COS(θ) Y = R * SIN(θ) SentiCircle of “Trojan Horse” +1 Very Positive Positive useful discover easily -1 +1 Neutral xi Dangerous X ri θi yi destroy Very Negative Negative -1 Region ri = TDOC(Ci) θi = Prior_Sentiment (Ci) * π threat Malicious attack Overall Contextual Sentiment (Senti-Median) Saif, H., Fernandez, M., He, Y. and Alani, H. (2014) SentiCircles for Contextual and Conceptual Semantic Sentiment Analysis of Twitter, ESWC2014
  19. 19. (3) Extracting Semantic Sentiment Patterns Patterns are extracted by finding clusters of Similar SentiCircles iPod Spyw are Oprah Obam a SentiCircle’s Feature Vector Geometry Density Dispersion (1) (2) K-means SS-Patterns SentiCircle’s Feature Vectors
  20. 20. Evaluation SS-Patterns Training Sentiment Classifiers Entity-level Sentiment Analysis Detect the sentiment (Positive, Negative, Neutral) of named entities extracted from tweets Tweet-level Sentiment Analysis Detect the overall sentiment (Positive, Negative) of a tweet.
  21. 21. Evaluation Setup (1) Sentiment Classifiers – Tweet-Level • Maximum Entropy (MaxEnt) • Naïve Bayes (NB) – Entity-Level • MLE Classifier
  22. 22. Datasets Evaluation Setup (2) Tweet-level 9 Twitter datasets Entity-Level 58 manually annotated named entities
  23. 23. Evaluation Setup (3) Baseline Features Syntactic Features Unigrams Individual unique terms in tweets POS Features Words’ part-of-speech tags Twitter Features Usernames, emoticons, hashtags, etc Lexicon Features Prior sentiment of words in a given sentiment lexicon(e.g., great->positive, destroy->negative) Semantic Features LDA-Topic Features Topics generated by LDA Semantic Concepts Semantic concepts of named entities in tweets (e.g., Obama -> Person, London -> City)
  24. 24. Results
  25. 25. Tweet-Level Sentiment Analysis (1) The baseline model is a sentiment classifier trained from word unigram features. • MaxEnt outperforms NB in average Accuracy and F1-measure
  26. 26. Tweet-Level Sentiment Analysis (2) Win/Loss in Accuracy and F-measure of using different features for sentiment classification on all nine datasets.
  27. 27. Entity-Level Sentiment Analysis SS-Patterns produce 6.31% and 7.5% higher accuracy and F-measure than other features 67.00 65.00 63.00 61.00 59.00 57.00 55.00 Accuracy F1 Unigrams LDA-Topics Semantic Concepts SS-Patterns
  28. 28. Within-Pattern Sentiment Consistency • Refers to the percentage of words having similar sentiment within a given pattern. • Strongly consistent patterns are those whose terms have similar sentiment.
  29. 29. Within-Pattern Sentiment Consistency • STS-Entity Dataset: – 58 Entities 14 SS-Patterns Consistency(Pattern12) = 88.89% Consistency(Pattern5) = 50% (Strongly Consistent) (Poorly Consistent) Average Sentiment Consistency (14 SS-Patterns) = 88%
  30. 30. Conclusion • We proposed a new approach for automatically extracting patterns from the contextual semantic and sentiment similarities of words in tweets. • Used patterns as features in tweet- and entity-level sentiment classification tasks • SS-Patterns consistently outperformed the syntactic and semantic type of features for entity- and tweet-level sentiment analysis • Conducted quantitative analysis on a sample of our extracted SS-Patterns and show that our patterns are strongly consistent with the sentiment of the words within them.
  31. 31. Thank You Email: hassan.saif@open.ac.uk Twitter: hrsaif Website: tweenator.com

×