Areview on sentiment analysis and
emotion detection from text
Adnan Nawaz
MSCS-II
FA21-RCS-002
Advanced Data Mining
1
Nandwani, P., & Verma, R. (2021). A review on sentiment analysis and emotion detection
from text. Social Network Analysis and Mining, 11(1), 1-19.
Table of contents
 Abstract
 Introduction
 Use of
Social Media
 Review of Techniques of S & E Analysis
 Levels of Sentimental Analysis
 Emotion Models
 Basic Steps in Sentiment / Emotion detection
 Overview on Dataset used
 Techniques for sentiment analysis and emotion
detection
 Challenges in sentiment analysis and emotion
detection
 Conclusion Advanced Data Mining
2
Abstract
 Social Networking platform use for communicating
feelings.
 Textual content, pictures, audio, and video to express
their feelings.
 Massive amount of data is generated.
 Rapidly processed data through sentimental analysis.
 SA recognizes polarity in text.
 Author has positive, negative or neutral toward an
item, Administration, location, individual etc.
 Individual’s precise emotional/mental state.
Advanced Data Mining
3
Topics
 Levels of sentiment analysis.
 Various emotion models, and
 The process of sentiment analysis and emotion
detection from text.
 Challenges during sentiment and emotion analysis.
Advanced Data Mining
4
Introduction
 Critical areas of NLP are Sentiment Analysis and
Emotion Recognition.
 SA means Data is positive, negative or neutral.
 ER means furious, cheerful, or depressed.
 Use of social Media to communicate their feelings,
arguments, opinion.
 Feedbacks and reviews on various product and
services.
 Rating and reviews to encourage vendors and service
provider.
 Transforms unstructured data into meaningful insights
for decision making
Advanced Data Mining
5
Use of
Social Media
 Broadcast information about product and collect client
feedback.
 Feedback is valuable not just for business marketers
for satisfaction.
 Sentimental analysis helps marketers in
understanding their customer's perspectives.
 The rise of social media has made it easier and faster.
Advanced Data Mining
6
Healthcare
Sector
 Social media have become essential sources of health-
related information.
 Health practitioners must use automated sentiment
and emotion analysis to save patient
Advanced Data Mining
7
Education
Sector
 Sentiment Analysis plays a critical role for both
student
 Enthusiasm, talent, and dedication decides teacher
efficiency.
 Timely feedback from students to improve teaching
approaches.
 Sentiment Analysis and emotion analysis of textual
feedback.
 Social Media use for advertising and marketing
purpose.
 Students and Guardians conduct online research about
institutes, courses.
 Sentiment and emotion analysis can help the student
to select the best institute or teacher
Advanced Data Mining
8
Techniques of S
& EAnalysis
 Three techniques for sentiment and emotion analysis:
1) Lexicon based,
2) Machine learning based, and
3) Deep learning based.
 Researcher face significant challenges, including:
1) Dealing with context,
2) Ridicule,
3) Statements conveying several emotions,
4) Spreading Web slang,
5) and lexical and syntactical ambiguity.
Advanced Data Mining
9
Sentimental
Analysis
 A process of obtaining meaningful information and
semantics from text using natural processing
techniques
 Big data is generated through Social media.
 Sentiment Analysis is use to analyze it effectively and
Efficiently.
 Not restricted to just positive or negative.
 It can be agreed or disagreed, good or bad.
 5-point scale: strongly disagree, disagree, neutral,
agree, or strongly agree
Advanced Data Mining
10
Example
 Scale of 1 to 5 was used for Reviews on European and
US destinations labeled.
 e.g 1 or 2 stars for negative polarity.
 Gräbner et al. (2012) built a domain-specific lexicon:
 Consists of tokens with their sentiment value.
 Customer reviews in tourism domain
 5-star ratings from terrible to excellent
Advanced Data Mining
11
Levels of
Sentimental
Analysis
 Sentiment analysis is possible at three levels:
 Sentence level,
 Broken down into sentence
 Document level, and
 Sentiment detected for entire document.
 To extract global sentiment.
 Contain redundant local patterns and lots of noise.
 Link between words and phrases
 Aspect level
 Opinion about a specific aspect or feature is determined.
 The speed of the processor is high, but this product is
overpriced.
 Here, speed and cost are two aspects.
Advanced Data Mining
12
Aspect level
sentiment
analysis
 Devi Sri Nandhini and Pradeep (2020) proposed an
algorithm to extract:
 Implicit aspects from documents based and
 By exploiting the relation between opinionated (adj)
words and explicit aspects(Noun).
 Ma et al. (2019) took care of two issues:
 Different polarities of various aspects in a single
sentence.
 Explicit position of context in an opinionated sentence.
 Built up a two-stage model based on LSTM
 Context words near to aspect are more relevant and
 Need greater attention than farther context words.
Advanced Data Mining
13
Stages
 At stage One:
 Model exploits multiple aspects in a sentence one by one
with a position attention mechanism.
 At the second state
 Identifies (aspect, sentence) pairs according to the
position of aspect and context around it and
 Calculates the polarity of each team simultaneously.
Advanced Data Mining
14
Emotion
Detection
 Process of identifying a person’s various feelings or
emotions.
 For example, joy, sadness, or fury.
 Physical activities such as heart rate, shivering of
hands, sweating, and voice pitch
 From text, Emotion detection is difficult
 New slang or terminologies being introduced e.g LOL
 Emotion detection is challenging
Advanced Data Mining
15
Emotion
Models
 Dimensional Emotion model:
 Represents emotions based on three parameters:
 Valence, Arousal, and Power
 Valence means polarity, and
 Arousal means how exciting a feeling is.
 e.g, delighted is more exciting than happy.
 Power signifies restriction over emotion.
Advanced Data Mining
16
Dimensionalmodelof
emotions
 These parameters decides
 Position of psychological states in 2-dimensional space
Advanced Data Mining
17
Emotion
models
 Categorical Emotion model:
 Emotions are defined discretely,
 such as anger, happiness, sadness, and fear.
 Categorized into four, six, or eight categories.
Advanced Data Mining
18
Advanced Data Mining
19
Advanced Data Mining
20
Advanced Data Mining
21
Models used
byAuthors
Authors Model Emotions Purpose
Batbattar &
Becker
Ekman’s
Model
Six
Sailunaz &
Alhajj
Ekman’s
Model
Six Tweets
Robert Ekman with
“Love” state
Seven Tweets
Ahmad Wheel of
Emotion Model
by Plutchik
Nine states Labeling
Hindi
Sentences
Laubert &
Parlamis
Shahver Three
Advanced Data Mining
22
Common states
in various
Models
Advanced Data Mining
23
Basic Steps in
Sentiment /
Emotion detection
Advanced Data Mining
24
Pre-processing
of text
 Social media platform's posts, audits, comments,
remarks, and criticisms are highly unstructured
 Data Cleaning is necessary
 Including tokenization, stop word removal, POS
tagging, etc.
Advanced Data Mining
25
Advanced Data Mining
26
Tokenization
 Tokenization:
 “this place is so beautiful” and
 Post-tokenization, it will become
 'this,’ "place," is, "so," beautiful.’
 Converting the text into standard form.
 Correcting the spelling of words, etc.
Advanced Data Mining
27
Removal of
Stop Words
 Stop words like "is," "at," "an," "the"
 Avoid unnecessary computations.
 Finding various aspects from a sentence.
 Noun or Noun phrase describe various aspect.
 While and emotions are conveyed by adjectives.
Advanced Data Mining
28
Stemming and
lemmatization
 Two crucial steps of preprocessing.
 In stemming:
 words are converted to their root form
 The terms "argued“ and "argue" become "argue.“
 Lemmatization:
 Turn a work into base word.
 the term "caught" is converted into "catch“.
 Removing numbers and Lemmatization enhanced accuracy.
 Removing punctuation did not affect accuracy.
Advanced Data Mining
29
Feature
extraction
 The process of converting or mapping the text or words
to real valued vectors is called word vectorization.
 Document is broken down into sentences and the
Words.
 The resulting matrix, each row represents a sentence
or document.
 while each feature column represents a word.
Advanced Data Mining
30
Feature
extraction
 Straightforward methods used is 'Bag of Words' (BOW).
 Fixed-length vector of the count is defined.
 Each entry corresponds to a word in a pre-defined dictionary
 Count of 0 if it is not present in the pre-defined dictionary,
otherwise >=1.
 Vector length is always equal to the words present in the
dictionary.
 Easy Implementation.
 Drawbacks:
 Sparse Matrix.
 Loses the order of words in the sentence, and
 Does not capture the meaning of a sentence
 To represent the text “Are you enjoying reading”
 I, Hope, you, are, enjoying, reading would be (0,0,1,1,1,1)
 Can be Improved:
 Pre-processing of text and
 By utilizing n-gram, TF-IDF. Advanced Data Mining
31
N-Gram
 Excellent option to resolve the order of words in
sentence vector representation.
 The value of n can be any natural number.
 “To teach is to touch a life forever” and n = 3 called
trigram.
 Will Generate, 'to teach is,' 'teach is to,' 'is to touch,' 'to
touch a,' 'touch a life,' 'a life forever.’
 Perform better than the BOW.
Advanced Data Mining
32
Term frequency-
inverse document
frequency
 Used for feature extraction.
 Represents text in matrix form.
 Ahuja et al. (2019) implemented six pre-processing techniques
and
 Compared two feature extraction techniques to identify the best
approach.
Advanced Data Mining
33
Techniques for
sentiment analysis
and emotion
detection
Advanced Data Mining
34
Lexicon based
approach
 This method maintains a word dictionary.
 Each positive and negative word is assigned a
sentiment value.
 Mean value is used to calculate the sentiment of the
entire sentence or document.
 Two Approaches:
1. Dictionary Approach:
 Words of some language
 less efficient.
 Multiple domains with a data-driven approach.
2. Corpus Based:
 Random sample of text in some language.
 domain-specific sentiment words.
 Poor generalization.
 excellent performance within a particular domain
Advanced Data Mining
35
Machine
Learning based
approach
 Dataset is divided into two parts:
 Training and testing purposes.
 Supervised Classification
 Naive Bayes, support vector machine (SVM), decision
trees, etc.
 Gamon (2004) applied a SVM:
 Accuracy upto 85.47%.
 Ye et al. (2009) worked with SVM, N-gram model, and
Naive Bayes:
 Sentiment and review on seven popular destinations of
Europe and the USA.
 Accuracy of up to 87.17%
Advanced Data Mining
36
Deep Learning
basedApproach
 These algorithms detect the sentiments from text
without doing feature engineering.
 Multiple deep learning algorithms:
 RNN, CNN
 Authors applied the model to review the data of
Cornell movie:
 More accurate as compared to SVM.
 Pasupa and Ayutthaya (2019) use CNN, LSTM, and Bi-
LSTM.
 children’s tale (Thai) dataset.
 with or without features:
 POS-tagging
 Thai2Vec(word embedding trained from Thai Wikipedia)
 Sentic (to understand the sentiment of the word).
 Best performance in CNN
Advanced Data Mining
37
Transfer Learning
Approach and
HybridApproach
 Part of machine learning.
 Model trained on large datasets.
 To resolve one problem can be applied to other related
issues.
 Re-using a pre-trained model on related domains as a
starting point
 Can save time and produce more efficient results.
 Zhang et al. (2012) proposed a novel instance learning
method:
 Modeling the distribution between different domains.
 classified the dataset:
 Amazon product reviews and
 Twitter dataset into positive and negative sentiments.
Advanced Data Mining
38
Advanced Data Mining
39
 Evaluation of Matrix
Advanced Data Mining
40
Challenges in
sentiment analysis
and emotion
detection
 Lot of data in the form of informal text.
 Spelling mistakes, new slang, and incorrect use of
grammar.
 Sometimes individuals do not express their emotions
clearly.
 E.g “Y have u been soooo late?”
Advanced Data Mining
41
Challenges in
sentiment analysis
and emotion
detection
Advanced Data Mining
42
Conclusion
 Review of the existing techniques for both E and S
detection is presented.
 Lexicon-based technique performs well in both.
 Dictionary-based approach is quite adaptable and
straightforward to apply.
 Corpus based method is built on rules.
 Machine and deep learning algorithms depends on
dataset size and Preprocessing.
 LSTM Model can cover long-term dependencies and
extract features very well.
 Various approaches depends on preprocessing and
feature extraction.
Advanced Data Mining
43
Any Question?
THANK YOU
Advanced Data Mining
44

A review on sentiment analysis and emotion detection.pptx

  • 1.
    Areview on sentimentanalysis and emotion detection from text Adnan Nawaz MSCS-II FA21-RCS-002 Advanced Data Mining 1 Nandwani, P., & Verma, R. (2021). A review on sentiment analysis and emotion detection from text. Social Network Analysis and Mining, 11(1), 1-19.
  • 2.
    Table of contents Abstract  Introduction  Use of Social Media  Review of Techniques of S & E Analysis  Levels of Sentimental Analysis  Emotion Models  Basic Steps in Sentiment / Emotion detection  Overview on Dataset used  Techniques for sentiment analysis and emotion detection  Challenges in sentiment analysis and emotion detection  Conclusion Advanced Data Mining 2
  • 3.
    Abstract  Social Networkingplatform use for communicating feelings.  Textual content, pictures, audio, and video to express their feelings.  Massive amount of data is generated.  Rapidly processed data through sentimental analysis.  SA recognizes polarity in text.  Author has positive, negative or neutral toward an item, Administration, location, individual etc.  Individual’s precise emotional/mental state. Advanced Data Mining 3
  • 4.
    Topics  Levels ofsentiment analysis.  Various emotion models, and  The process of sentiment analysis and emotion detection from text.  Challenges during sentiment and emotion analysis. Advanced Data Mining 4
  • 5.
    Introduction  Critical areasof NLP are Sentiment Analysis and Emotion Recognition.  SA means Data is positive, negative or neutral.  ER means furious, cheerful, or depressed.  Use of social Media to communicate their feelings, arguments, opinion.  Feedbacks and reviews on various product and services.  Rating and reviews to encourage vendors and service provider.  Transforms unstructured data into meaningful insights for decision making Advanced Data Mining 5
  • 6.
    Use of Social Media Broadcast information about product and collect client feedback.  Feedback is valuable not just for business marketers for satisfaction.  Sentimental analysis helps marketers in understanding their customer's perspectives.  The rise of social media has made it easier and faster. Advanced Data Mining 6
  • 7.
    Healthcare Sector  Social mediahave become essential sources of health- related information.  Health practitioners must use automated sentiment and emotion analysis to save patient Advanced Data Mining 7
  • 8.
    Education Sector  Sentiment Analysisplays a critical role for both student  Enthusiasm, talent, and dedication decides teacher efficiency.  Timely feedback from students to improve teaching approaches.  Sentiment Analysis and emotion analysis of textual feedback.  Social Media use for advertising and marketing purpose.  Students and Guardians conduct online research about institutes, courses.  Sentiment and emotion analysis can help the student to select the best institute or teacher Advanced Data Mining 8
  • 9.
    Techniques of S &EAnalysis  Three techniques for sentiment and emotion analysis: 1) Lexicon based, 2) Machine learning based, and 3) Deep learning based.  Researcher face significant challenges, including: 1) Dealing with context, 2) Ridicule, 3) Statements conveying several emotions, 4) Spreading Web slang, 5) and lexical and syntactical ambiguity. Advanced Data Mining 9
  • 10.
    Sentimental Analysis  A processof obtaining meaningful information and semantics from text using natural processing techniques  Big data is generated through Social media.  Sentiment Analysis is use to analyze it effectively and Efficiently.  Not restricted to just positive or negative.  It can be agreed or disagreed, good or bad.  5-point scale: strongly disagree, disagree, neutral, agree, or strongly agree Advanced Data Mining 10
  • 11.
    Example  Scale of1 to 5 was used for Reviews on European and US destinations labeled.  e.g 1 or 2 stars for negative polarity.  Gräbner et al. (2012) built a domain-specific lexicon:  Consists of tokens with their sentiment value.  Customer reviews in tourism domain  5-star ratings from terrible to excellent Advanced Data Mining 11
  • 12.
    Levels of Sentimental Analysis  Sentimentanalysis is possible at three levels:  Sentence level,  Broken down into sentence  Document level, and  Sentiment detected for entire document.  To extract global sentiment.  Contain redundant local patterns and lots of noise.  Link between words and phrases  Aspect level  Opinion about a specific aspect or feature is determined.  The speed of the processor is high, but this product is overpriced.  Here, speed and cost are two aspects. Advanced Data Mining 12
  • 13.
    Aspect level sentiment analysis  DeviSri Nandhini and Pradeep (2020) proposed an algorithm to extract:  Implicit aspects from documents based and  By exploiting the relation between opinionated (adj) words and explicit aspects(Noun).  Ma et al. (2019) took care of two issues:  Different polarities of various aspects in a single sentence.  Explicit position of context in an opinionated sentence.  Built up a two-stage model based on LSTM  Context words near to aspect are more relevant and  Need greater attention than farther context words. Advanced Data Mining 13
  • 14.
    Stages  At stageOne:  Model exploits multiple aspects in a sentence one by one with a position attention mechanism.  At the second state  Identifies (aspect, sentence) pairs according to the position of aspect and context around it and  Calculates the polarity of each team simultaneously. Advanced Data Mining 14
  • 15.
    Emotion Detection  Process ofidentifying a person’s various feelings or emotions.  For example, joy, sadness, or fury.  Physical activities such as heart rate, shivering of hands, sweating, and voice pitch  From text, Emotion detection is difficult  New slang or terminologies being introduced e.g LOL  Emotion detection is challenging Advanced Data Mining 15
  • 16.
    Emotion Models  Dimensional Emotionmodel:  Represents emotions based on three parameters:  Valence, Arousal, and Power  Valence means polarity, and  Arousal means how exciting a feeling is.  e.g, delighted is more exciting than happy.  Power signifies restriction over emotion. Advanced Data Mining 16
  • 17.
    Dimensionalmodelof emotions  These parametersdecides  Position of psychological states in 2-dimensional space Advanced Data Mining 17
  • 18.
    Emotion models  Categorical Emotionmodel:  Emotions are defined discretely,  such as anger, happiness, sadness, and fear.  Categorized into four, six, or eight categories. Advanced Data Mining 18
  • 19.
  • 20.
  • 21.
  • 22.
    Models used byAuthors Authors ModelEmotions Purpose Batbattar & Becker Ekman’s Model Six Sailunaz & Alhajj Ekman’s Model Six Tweets Robert Ekman with “Love” state Seven Tweets Ahmad Wheel of Emotion Model by Plutchik Nine states Labeling Hindi Sentences Laubert & Parlamis Shahver Three Advanced Data Mining 22
  • 23.
  • 24.
    Basic Steps in Sentiment/ Emotion detection Advanced Data Mining 24
  • 25.
    Pre-processing of text  Socialmedia platform's posts, audits, comments, remarks, and criticisms are highly unstructured  Data Cleaning is necessary  Including tokenization, stop word removal, POS tagging, etc. Advanced Data Mining 25
  • 26.
  • 27.
    Tokenization  Tokenization:  “thisplace is so beautiful” and  Post-tokenization, it will become  'this,’ "place," is, "so," beautiful.’  Converting the text into standard form.  Correcting the spelling of words, etc. Advanced Data Mining 27
  • 28.
    Removal of Stop Words Stop words like "is," "at," "an," "the"  Avoid unnecessary computations.  Finding various aspects from a sentence.  Noun or Noun phrase describe various aspect.  While and emotions are conveyed by adjectives. Advanced Data Mining 28
  • 29.
    Stemming and lemmatization  Twocrucial steps of preprocessing.  In stemming:  words are converted to their root form  The terms "argued“ and "argue" become "argue.“  Lemmatization:  Turn a work into base word.  the term "caught" is converted into "catch“.  Removing numbers and Lemmatization enhanced accuracy.  Removing punctuation did not affect accuracy. Advanced Data Mining 29
  • 30.
    Feature extraction  The processof converting or mapping the text or words to real valued vectors is called word vectorization.  Document is broken down into sentences and the Words.  The resulting matrix, each row represents a sentence or document.  while each feature column represents a word. Advanced Data Mining 30
  • 31.
    Feature extraction  Straightforward methodsused is 'Bag of Words' (BOW).  Fixed-length vector of the count is defined.  Each entry corresponds to a word in a pre-defined dictionary  Count of 0 if it is not present in the pre-defined dictionary, otherwise >=1.  Vector length is always equal to the words present in the dictionary.  Easy Implementation.  Drawbacks:  Sparse Matrix.  Loses the order of words in the sentence, and  Does not capture the meaning of a sentence  To represent the text “Are you enjoying reading”  I, Hope, you, are, enjoying, reading would be (0,0,1,1,1,1)  Can be Improved:  Pre-processing of text and  By utilizing n-gram, TF-IDF. Advanced Data Mining 31
  • 32.
    N-Gram  Excellent optionto resolve the order of words in sentence vector representation.  The value of n can be any natural number.  “To teach is to touch a life forever” and n = 3 called trigram.  Will Generate, 'to teach is,' 'teach is to,' 'is to touch,' 'to touch a,' 'touch a life,' 'a life forever.’  Perform better than the BOW. Advanced Data Mining 32
  • 33.
    Term frequency- inverse document frequency Used for feature extraction.  Represents text in matrix form.  Ahuja et al. (2019) implemented six pre-processing techniques and  Compared two feature extraction techniques to identify the best approach. Advanced Data Mining 33
  • 34.
    Techniques for sentiment analysis andemotion detection Advanced Data Mining 34
  • 35.
    Lexicon based approach  Thismethod maintains a word dictionary.  Each positive and negative word is assigned a sentiment value.  Mean value is used to calculate the sentiment of the entire sentence or document.  Two Approaches: 1. Dictionary Approach:  Words of some language  less efficient.  Multiple domains with a data-driven approach. 2. Corpus Based:  Random sample of text in some language.  domain-specific sentiment words.  Poor generalization.  excellent performance within a particular domain Advanced Data Mining 35
  • 36.
    Machine Learning based approach  Datasetis divided into two parts:  Training and testing purposes.  Supervised Classification  Naive Bayes, support vector machine (SVM), decision trees, etc.  Gamon (2004) applied a SVM:  Accuracy upto 85.47%.  Ye et al. (2009) worked with SVM, N-gram model, and Naive Bayes:  Sentiment and review on seven popular destinations of Europe and the USA.  Accuracy of up to 87.17% Advanced Data Mining 36
  • 37.
    Deep Learning basedApproach  Thesealgorithms detect the sentiments from text without doing feature engineering.  Multiple deep learning algorithms:  RNN, CNN  Authors applied the model to review the data of Cornell movie:  More accurate as compared to SVM.  Pasupa and Ayutthaya (2019) use CNN, LSTM, and Bi- LSTM.  children’s tale (Thai) dataset.  with or without features:  POS-tagging  Thai2Vec(word embedding trained from Thai Wikipedia)  Sentic (to understand the sentiment of the word).  Best performance in CNN Advanced Data Mining 37
  • 38.
    Transfer Learning Approach and HybridApproach Part of machine learning.  Model trained on large datasets.  To resolve one problem can be applied to other related issues.  Re-using a pre-trained model on related domains as a starting point  Can save time and produce more efficient results.  Zhang et al. (2012) proposed a novel instance learning method:  Modeling the distribution between different domains.  classified the dataset:  Amazon product reviews and  Twitter dataset into positive and negative sentiments. Advanced Data Mining 38
  • 39.
  • 40.
     Evaluation ofMatrix Advanced Data Mining 40
  • 41.
    Challenges in sentiment analysis andemotion detection  Lot of data in the form of informal text.  Spelling mistakes, new slang, and incorrect use of grammar.  Sometimes individuals do not express their emotions clearly.  E.g “Y have u been soooo late?” Advanced Data Mining 41
  • 42.
    Challenges in sentiment analysis andemotion detection Advanced Data Mining 42
  • 43.
    Conclusion  Review ofthe existing techniques for both E and S detection is presented.  Lexicon-based technique performs well in both.  Dictionary-based approach is quite adaptable and straightforward to apply.  Corpus based method is built on rules.  Machine and deep learning algorithms depends on dataset size and Preprocessing.  LSTM Model can cover long-term dependencies and extract features very well.  Various approaches depends on preprocessing and feature extraction. Advanced Data Mining 43
  • 44.