Sentiment Analysis
Presented by
Sreerup Karmakar
Guided by
Prof. xxxx
University of Engineering & Management
INTRODUCTION
 Sentiment Analysis is automatic extraction of subjective content of text and
predicting the subjectivity such as positive or negative.
 Subjectivity is linguistic expression of somebody’s emotions, opinion,
sentiment.
Subjectivity Analysis
Review
Mining
Sentiment
Analysis
Opinion
Mining
SentimentAnalysis
Sentence Level
Document
Level
Feature Level
Analysis of PreviousResearch
 (Pang and Lee in 2002) [1] researched the effect of various machine learning
techniques (Naïve Bayes, Maximum Entropy , and Support Vector Machines) in
the specific domain of movie reviews. They were able to achieve accuracy of
82.9% using SVM (unigram model).
• (Pang and Lee, 2008) [2] Gives a survey on sentiment analysis. Researchers have
also
analyzed the brand impact of microblogging.
• . (Turney 2002) [3] Presents a simple algorithm, called sematic orientation,
for detecting sentiment.
• (Pang and Lee 2004) [4] Presents a hierarchical scheme in which text is first
classified
as containing sentiment and then classified as positive or negative.
• Work has been done in using emoticons as labels for positive and negative
sentiment. This is very relevant to Twitter because many users have emoticons in
their tweets.
BACKGROUND
 Existing approaches can be classified into two categories
 Keyword spotting approach
 Also known as Naive Dictionary Lookup. It is the most naïve approach as
it categories the text on the basis of presence of unambiguous words such
as happy , sad , awesome, worst etc.
Advantages :
It is very easy to understand and implement.
Disadvantage :
Poor in classification of text with negation i.e., low accuracy. (54%)
• Statistical methods.
 Naïve Bayes Classifier.
 Support Vector Machine (SVM).
PHASES
 Pre processing phase: The data entered by user is first cleaned to
reduce noise so that key words could be analysed
 Feature Extraction Phase: In this phase the key words are given a
token and are put under analysis
 Classification Phase: Now we have all the data waiting to be
classified. Based on different algorithms they are put under
Category of happy, sad etc.
• Naïve Bayes Classifier
– Simple classification of words based on ‘Bayes
theorem’.
– It is a ‘Bag of words’ (text represented as collection
of it’s words, discarding grammar and order of words
but keeping multiplicity) approach for subjective
analysis of a content.
– Application -: Sentiment detection, Email spam
detection, Document categorization etc..
– Superior in terms of CPU and Memory utilization as
shown by Huang, J. (2003).
• Probabilistic Analysis of Naïve Bayes
For a document d and class c ,
By Bayes theorem
𝑃 𝑐 𝑑 =
𝑃(𝑑/𝑐)𝑃(𝑐)
𝑃(𝑑)
where 𝑃 𝑐 𝑑 = probability of class given data
𝑃 𝑑 𝑐 = probability of data given class
𝑃(𝑐) = probability of class
𝑃(𝑑) = probability of data
• Support VectorMachine
 A Support Vector Machine (SVM) is a supervised
learning technique from the field of Machine Learning
applicable to classification.
RESULT
100
90
80
70
60
50
40
30
20
10
0
54
79 78 81 83
Accuracy
Approaches vsAccuracy
Naïve Dictionary Naïve Bayes
Approaches
SVM(Linear) SVM(Polynomial Kernel) SVM(RBF Kernel)
LIMITATIONS
• Sarcasm Detection is still a major issue!
• Biggest Limitation of SVM lies in choice of the kernel.
• Second Limitation of SVM is speed and size, both in training and
testing.
• From a practical point of view, perhaps the most serious problem
with SVMs is the high algorithmic complexity and extensive
memory requirements of the required quadratic programming in
large scale tasks.
• Doesn’t checks whether the Document actually contains a Sentiment
or not.
 For example: Sun is a star. This is Fact and Contains no Sentiment and
hence must be rejected.
PURPOSE
 With many websites enabling review option for
products, there is an incremental increase in data in
natural language.
 Analysis of this data is beneficial for both Customers as
well as
organizations to keep track of product Activity.
 Data from feedback forms can be processed to find
sentiment of people regarding the
teacher/subject/college.
Scope for Future Work
• Neutral Network may prove out to be better as pointed out by
Stanford Treebank research.
• Develop Models which may capture Sarcasm to at least some level!
• Document should first be Classified as containing sentiment or not.
• Opinionated document/reviews separated from fact documents.
• Get every aspect of review broken according to different sentiments
embedded within review i.e. review may contain both positive and
negative sentiment which should be appropriately reported.
• Different Features be used along with different models like bi-grams,
trigrams
CONCLUSION
Lexical Resources have been developed to capture
sentiment-related nature
Subjective extracts provide a better accuracy of
sentiment prediction
Several approaches use algorithms like Naïve Bayes,
clustering, etc. to perform sentiment analysis
The cognitive angle to Sentiment Analysis can be
explored in the future
REFERENCE
 Bo Pang, Lillian Lee and Shivakumar Vaithyanathan (2002). “ Thumbs
up? Sentiment Classification using Machine Learnin g Techniques ”.
Proceedings of EMNLP 2002, pp. 79– 86.
 Bo Pang and Lillian Lee, “ Opinion mining and sentiment analysis ,”
Foundations and Trends in Information Retrieval 2(1-2), pp. 1–135, 2008.
 Peter Turney (2002). " Thumbs Up or Thumbs Down? Semantic
Orientation Applied to Unsupervised Classification of Reviews ".
Proceedings of the Association for Computational Linguistics. pp.
417–424, 2002.
 Bo Pang and Lillian Lee, “A Sentimental Education: SentimentAnalysis
Using Subjectivity Summarization Based on Minimum Cuts'' , Proceedings
of the ACL, 2004.

Sentiment analysis

  • 1.
    Sentiment Analysis Presented by SreerupKarmakar Guided by Prof. xxxx University of Engineering & Management
  • 2.
    INTRODUCTION  Sentiment Analysisis automatic extraction of subjective content of text and predicting the subjectivity such as positive or negative.  Subjectivity is linguistic expression of somebody’s emotions, opinion, sentiment. Subjectivity Analysis Review Mining Sentiment Analysis Opinion Mining SentimentAnalysis Sentence Level Document Level Feature Level
  • 3.
    Analysis of PreviousResearch (Pang and Lee in 2002) [1] researched the effect of various machine learning techniques (Naïve Bayes, Maximum Entropy , and Support Vector Machines) in the specific domain of movie reviews. They were able to achieve accuracy of 82.9% using SVM (unigram model). • (Pang and Lee, 2008) [2] Gives a survey on sentiment analysis. Researchers have also analyzed the brand impact of microblogging. • . (Turney 2002) [3] Presents a simple algorithm, called sematic orientation, for detecting sentiment. • (Pang and Lee 2004) [4] Presents a hierarchical scheme in which text is first classified as containing sentiment and then classified as positive or negative. • Work has been done in using emoticons as labels for positive and negative sentiment. This is very relevant to Twitter because many users have emoticons in their tweets.
  • 4.
    BACKGROUND  Existing approachescan be classified into two categories  Keyword spotting approach  Also known as Naive Dictionary Lookup. It is the most naïve approach as it categories the text on the basis of presence of unambiguous words such as happy , sad , awesome, worst etc. Advantages : It is very easy to understand and implement. Disadvantage : Poor in classification of text with negation i.e., low accuracy. (54%) • Statistical methods.  Naïve Bayes Classifier.  Support Vector Machine (SVM).
  • 5.
    PHASES  Pre processingphase: The data entered by user is first cleaned to reduce noise so that key words could be analysed  Feature Extraction Phase: In this phase the key words are given a token and are put under analysis  Classification Phase: Now we have all the data waiting to be classified. Based on different algorithms they are put under Category of happy, sad etc.
  • 6.
    • Naïve BayesClassifier – Simple classification of words based on ‘Bayes theorem’. – It is a ‘Bag of words’ (text represented as collection of it’s words, discarding grammar and order of words but keeping multiplicity) approach for subjective analysis of a content. – Application -: Sentiment detection, Email spam detection, Document categorization etc.. – Superior in terms of CPU and Memory utilization as shown by Huang, J. (2003).
  • 7.
    • Probabilistic Analysisof Naïve Bayes For a document d and class c , By Bayes theorem 𝑃 𝑐 𝑑 = 𝑃(𝑑/𝑐)𝑃(𝑐) 𝑃(𝑑) where 𝑃 𝑐 𝑑 = probability of class given data 𝑃 𝑑 𝑐 = probability of data given class 𝑃(𝑐) = probability of class 𝑃(𝑑) = probability of data
  • 8.
    • Support VectorMachine A Support Vector Machine (SVM) is a supervised learning technique from the field of Machine Learning applicable to classification.
  • 9.
    RESULT 100 90 80 70 60 50 40 30 20 10 0 54 79 78 8183 Accuracy Approaches vsAccuracy Naïve Dictionary Naïve Bayes Approaches SVM(Linear) SVM(Polynomial Kernel) SVM(RBF Kernel)
  • 10.
    LIMITATIONS • Sarcasm Detectionis still a major issue! • Biggest Limitation of SVM lies in choice of the kernel. • Second Limitation of SVM is speed and size, both in training and testing. • From a practical point of view, perhaps the most serious problem with SVMs is the high algorithmic complexity and extensive memory requirements of the required quadratic programming in large scale tasks. • Doesn’t checks whether the Document actually contains a Sentiment or not.  For example: Sun is a star. This is Fact and Contains no Sentiment and hence must be rejected.
  • 11.
    PURPOSE  With manywebsites enabling review option for products, there is an incremental increase in data in natural language.  Analysis of this data is beneficial for both Customers as well as organizations to keep track of product Activity.  Data from feedback forms can be processed to find sentiment of people regarding the teacher/subject/college.
  • 12.
    Scope for FutureWork • Neutral Network may prove out to be better as pointed out by Stanford Treebank research. • Develop Models which may capture Sarcasm to at least some level! • Document should first be Classified as containing sentiment or not. • Opinionated document/reviews separated from fact documents. • Get every aspect of review broken according to different sentiments embedded within review i.e. review may contain both positive and negative sentiment which should be appropriately reported. • Different Features be used along with different models like bi-grams, trigrams
  • 13.
    CONCLUSION Lexical Resources havebeen developed to capture sentiment-related nature Subjective extracts provide a better accuracy of sentiment prediction Several approaches use algorithms like Naïve Bayes, clustering, etc. to perform sentiment analysis The cognitive angle to Sentiment Analysis can be explored in the future
  • 14.
    REFERENCE  Bo Pang,Lillian Lee and Shivakumar Vaithyanathan (2002). “ Thumbs up? Sentiment Classification using Machine Learnin g Techniques ”. Proceedings of EMNLP 2002, pp. 79– 86.  Bo Pang and Lillian Lee, “ Opinion mining and sentiment analysis ,” Foundations and Trends in Information Retrieval 2(1-2), pp. 1–135, 2008.  Peter Turney (2002). " Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews ". Proceedings of the Association for Computational Linguistics. pp. 417–424, 2002.  Bo Pang and Lillian Lee, “A Sentimental Education: SentimentAnalysis Using Subjectivity Summarization Based on Minimum Cuts'' , Proceedings of the ACL, 2004.