Presentation
On
SENTIMENTAL ANALYSIS IN TWITTER
Submitted To Submitted By
Department of Computer Science
AIM & ACT
IN
B.Tech (IT)
Session : Jul Dec’2017
Name : KHUSHBOO GUPTA
ID : BTBTI15052
Exam Roll No : 8244
Class S.No : 34
S.NO. TOPIC NAME SLIDE NO.
1 Introduction 4
2 Methodology 7
3 Why Twitter? 11
4 Classification of Techniques 12
5 Naïve Bayes Approach for
SA
14
6 Applications 18
7 Future Scope 19
8 Conclusion 20
9 References 21
WHY USE SENTIMENYAL
ANALYSIS ??
Promotion : Is This
Review Positive Or
Negative?Products : What Do
People Think About The
New Iphone?Politics : What Do
People Think About This
Candidate Or Issue?
Prediction: Predict Election
Outcomes Or Market
Trends From Sentiment
WHAT IS SENTIMENTAL
ANALYSIS ?
Sentiment analysis is the process
of determining the feeling behind
a piece of text, conversation or a
social media update.
Classification in
terms of polarity of
the given tweet
used in twitter
and other social
channels
86%
marketers
value it
highly
opinion is of
positive , negative
or neutral
negative
POSITIVE
neutral
METHODOLOGY
1. DATA COLLECTION : Sentiments in the form of tweets
collected from Twitter/ any other platforms .
2. TOKENSIER :
 filteration of text
 goes through POS tagger
• nouns/pronouns removed
• measures the intensity of any word ie is it used as a
verb or adjective ?
 Remove slag words.
 Remove URL (friendorfollow.com/twitter/most-
tweets/)
 Remove HASTAG(#),numbers.
 Replace sequence of repeated character coooooool by
cool
3. NEGATION :
Very important in sentimental analysis for
the “not” can also be used for positive as “
not only ” …so there can be no confusion !!
4. FEATURE EXTRACTION :
• Percentage of capitalized word
• No of –ve /+ve capitalized word
• No of +ve /-ve hashtag
• No of +ve /-ve emoticons
• No. of negations
• No. of special characters
Example : & $ @ %
Perform Subjectivity
Classification
In this one can find out sentence is either an
objective sentence or a subjective sentence
as per the opinion expressed.
Perform Classification of
Subjective Sentence.
In this, if sentence is subjective sentence,
then one can find out sentence is either a
positive opinion or negative opinion
5. Sentiment Classification at Sentence Level
Now for the similar task we can compute the sentence-level
classification. Suppose the task is given as below. For a sentence S,
perform the two important sub-tasks which are given below
PREDICTIONS
The model is built to predict the sentiment
of new tweets…
Feature extracted are next focused to
classifier.
• social networking and microblogging service
• allows users to post called real time messages called
tweets .
• messages restricted to 140 characters in length
people use acronyms, make spelling mistakes , use
emoticons ,and other characters that give a special
meaning
Following is a brief terminology associated with tweets :
EMOTICONS : express the user’s mood.
TARGET : use the ‘@’ to refer to other users
HASHTAG : users usually use hashtags to mark topics
WHY TWITTER? ??
Sentimental
Analysis
Machine
Learning
Approach
Supervised
Learning
Decision
Tree
Identifiers
Linear
Classifiers
Support
Vector
Machines
Neural
Networks
Rule Based
Classifiers
Probalistic
Classifiers
Naïve
Bayes
Classifiers
Bayesian
Networks
Maximunm
Entropy
Unsupervise
d Learning
Lexicon
Based
Approach
Dictionary
Based
Approach
Corpus
Based
Approach
Statistical Semantics
CLASSIFICATION
OF TECHNIQUES
USED FOR
SENTIMENTAL
ANALYSIS
Machine Learning Approach :
• uses ML algorithm & linguistic features
• optimises the performance of the system using example
data.
• Example : The big data framework such as Mahout and
Pentaho contain library and plugins.
2 sets of documents are required by ML approach :
1.Training Sets : Used by the classifier to learn the
document characteristics.
2.Testing Sets : Used to validate classifier performance.
Machine Learning
Approach
Supervised Methods : use a
large number of labelled
training documents.
Unsupervised Methods
Naïve Bayes Approach
for Sentimental Analysis
Positives
HAPPY
GOOD
GREAT
Negatives
SAD
POOR
BAD
Lets have 5 sentences :
1. I loved the movie .
2. I hated the movie
3. A great movie , good movie .
4. Poor acting .
5. Great acting , a good movie .
Sentence
No
I Loved The Movie Hated A Great Good Poor Acting Class
1. 1 1 1 1 Pos(+)
2. 1 1 1 1 Neg(-
)
3. 2 1 1 1 Pos(+)
4. 1 1 Neg(-
)
5. 1 1 1 1 Pos(+)
P(+) =
3
5
P(-) =
2
5
P(word|label) =
𝑛𝑜. 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 𝑡ℎ𝑒 𝑤𝑜𝑟𝑑 𝑜𝑐𝑐𝑢𝑟𝑠 + 1
𝑛𝑜. 𝑜𝑓 𝑤𝑜𝑟𝑑𝑠 𝑡ℎ𝑒𝑟𝑒 𝑡ℎ𝑒𝑟𝑒 𝑎𝑟𝑒 𝑖𝑛 𝑙𝑎𝑏𝑒𝑙 + 𝑛𝑜 𝑜𝑓 𝑘𝑒𝑦𝑤𝑜𝑟𝑑𝑠
P(I|+) =
1+1
14+10
=
2
24
=0.0833
P(loved |+) =
1+1
14+10
=
2
24
=0.0833
P(the|+) =
1+1
14+10
=
2
24
=0.0833
P(movie|+) =
4+1
14+10
=
5
24
=0.2083
P(hated|+) =
0+1
14+10
=
1
24
=0.04166
P(a|+) =
2+1
14+10
=
3
24
=0.0125
P(great|+) =
2+1
14+10
=
3
24
=0.0125
P(good|+) =
2+1
14+10
=
3
24
=0.0125
P(poor|+) =
0+1
14+10
=
1
24
=0.04166
P(acting |+) =
1+1
14+10
=
2
24
=0.0833
P(I|-) =
1+1
6+10
=
2
6
=0.125
P(loved|-) =
0+1
6+10
=
1
6
=0.0625
P(the|-) =
1+1
6+10
=
2
6
=0.125
P(movie|-) =
1+1
6+10
=
2
6
=0.125
P(hated|-) =
1+1
6+10
=
2
6
=0.125
P(poor|-) =
1+1
6+10
=
2
6
=0.125
P(acting|-) =
1+1
6+10
=
2
6
=0.125
P(a|-) =
0+1
6+10
=
1
6
=0.0625
P(great|-) =
0+1
6+10
=
1
6
=0.0625
P(good|-) =
0+1
6+10
=
1
6
=0.0625
I hated the poor acting .
P( positive )= P(+) P(I|+) P(hated|+) P(the|+) P(poor|+) P(acting |+)
= 0.6 * 0.0833 * 0.04166 * 0.0833 * 0.04166 * 0.0833
P( negative )= P(-) P(I|-) P(hated|-) P(the|-) P(poor|-) P(acting |-)
= 0.4 * 0.125 * 0.125 * 0.125 * 0.125 * 0.125
= 6.02 X 10 -8
= 1.2207 X 10 -5
P( negative) > P( positive )
RESULT : There is more negativity in the tweet
and so we label this tweet as NEGATIVE
APPLICATIONS
• Dissatisfaction oriented online advertising
• On-line commerce
 Ex : Brand A or B? Quality X or Y ? Feature C or D ?
• Voting advise applications
• Clarification of politicians’ positions
• Real-world events monitoring
 Ex: Leader A or B ?
• Legal matters “blawgs”(subset of blogs )
• Policy or government-regulation proposals
• Intelligent transportation systems
 Ex: Is the movement / law proposals
advantageous??
Using different other models and
algorithms.
Temporal analysis
 Data Pre-Processing using more parameters to get
best sentiments
 accuracy to process human sentiments
 Updating Dictionary for new Synonym and
Antonyms of already existing words.
Web-Application can be converted to Mobile
Application
Context Sentimental Analysis may be implemented in
future for accuracy purposes.
FUTURE SCOPE
 “ What others think “ is important.
 Sentiment analysis or opinion mining is a field of study
that analyzes people’s sentiments, attitudes, or
emotions towards certain entities.
 Supervised algorithms are still an open field for research.
 Naïve Bayes and support vector machines are the most
frequently used ML algorithms for solving sc problem.
 Micro-blogs, blogs and forums as well as news source, is
widely used .
 Hence we conclude that Twitter can be the best platform
for sentimental analysis
• https://journalofbigdata.springeropen.com/articles/10.1
186/s40537-015-0015-2
• http://ijiet.com/wp-content/uploads/2016/04/37.pdf
• https://github.com/mayank93/Twitter-Sentiment-
Analysis
• http://www.pythonforbeginners.com/systemsprogrammi
ng/using-the-csv-module-in-python/
• http://www.academia.edu/6723240/Mining_Opinion_Fea
tures_in_Customer_Reviews
• http://content26.com/blog/bing-liu-the-science-of-
detecting-fake-reviews/
• http://www.scienceforseo.com
• http://help.sentiment140.com/for-students
REFERENCES
• Ronen Feldman, “Techniques and Application of
Sentiment Analysis”, Communication of ACM, April 2013,
vol. 56.No.4.
• http://help.sentiment140.com/for-students
• REASEARCH PAPER : Utilization of project sentimental
analysis as a project performance predictor by Bob
Prieto
• REASEARCH PAPER : Sentimental Analysis : Measuring
Opinions by Chetashri Bhadane , Hardi Dalal and Heenal
Doshi
• RESEARCH PAPER : Overview and Future Opportunities of
Sentimental Analysis Approaches for Big Data by
Nurfadhlina Mohd Sharef, Harnani Mat Zin and Samaneh
Nadali
Sentimental Analysis - Naive Bayes Algorithm

Sentimental Analysis - Naive Bayes Algorithm

  • 1.
    Presentation On SENTIMENTAL ANALYSIS INTWITTER Submitted To Submitted By Department of Computer Science AIM & ACT IN B.Tech (IT) Session : Jul Dec’2017 Name : KHUSHBOO GUPTA ID : BTBTI15052 Exam Roll No : 8244 Class S.No : 34
  • 2.
    S.NO. TOPIC NAMESLIDE NO. 1 Introduction 4 2 Methodology 7 3 Why Twitter? 11 4 Classification of Techniques 12 5 Naïve Bayes Approach for SA 14 6 Applications 18 7 Future Scope 19 8 Conclusion 20 9 References 21
  • 4.
    WHY USE SENTIMENYAL ANALYSIS?? Promotion : Is This Review Positive Or Negative?Products : What Do People Think About The New Iphone?Politics : What Do People Think About This Candidate Or Issue? Prediction: Predict Election Outcomes Or Market Trends From Sentiment
  • 6.
    WHAT IS SENTIMENTAL ANALYSIS? Sentiment analysis is the process of determining the feeling behind a piece of text, conversation or a social media update. Classification in terms of polarity of the given tweet used in twitter and other social channels 86% marketers value it highly opinion is of positive , negative or neutral
  • 7.
  • 8.
    METHODOLOGY 1. DATA COLLECTION: Sentiments in the form of tweets collected from Twitter/ any other platforms . 2. TOKENSIER :  filteration of text  goes through POS tagger • nouns/pronouns removed • measures the intensity of any word ie is it used as a verb or adjective ?  Remove slag words.  Remove URL (friendorfollow.com/twitter/most- tweets/)  Remove HASTAG(#),numbers.  Replace sequence of repeated character coooooool by cool
  • 9.
    3. NEGATION : Veryimportant in sentimental analysis for the “not” can also be used for positive as “ not only ” …so there can be no confusion !! 4. FEATURE EXTRACTION : • Percentage of capitalized word • No of –ve /+ve capitalized word • No of +ve /-ve hashtag • No of +ve /-ve emoticons • No. of negations • No. of special characters Example : & $ @ %
  • 10.
    Perform Subjectivity Classification In thisone can find out sentence is either an objective sentence or a subjective sentence as per the opinion expressed. Perform Classification of Subjective Sentence. In this, if sentence is subjective sentence, then one can find out sentence is either a positive opinion or negative opinion 5. Sentiment Classification at Sentence Level Now for the similar task we can compute the sentence-level classification. Suppose the task is given as below. For a sentence S, perform the two important sub-tasks which are given below
  • 11.
    PREDICTIONS The model isbuilt to predict the sentiment of new tweets… Feature extracted are next focused to classifier.
  • 12.
    • social networkingand microblogging service • allows users to post called real time messages called tweets . • messages restricted to 140 characters in length people use acronyms, make spelling mistakes , use emoticons ,and other characters that give a special meaning Following is a brief terminology associated with tweets : EMOTICONS : express the user’s mood. TARGET : use the ‘@’ to refer to other users HASHTAG : users usually use hashtags to mark topics WHY TWITTER? ??
  • 13.
  • 14.
    Machine Learning Approach: • uses ML algorithm & linguistic features • optimises the performance of the system using example data. • Example : The big data framework such as Mahout and Pentaho contain library and plugins. 2 sets of documents are required by ML approach : 1.Training Sets : Used by the classifier to learn the document characteristics. 2.Testing Sets : Used to validate classifier performance. Machine Learning Approach Supervised Methods : use a large number of labelled training documents. Unsupervised Methods
  • 15.
    Naïve Bayes Approach forSentimental Analysis Positives HAPPY GOOD GREAT Negatives SAD POOR BAD Lets have 5 sentences : 1. I loved the movie . 2. I hated the movie 3. A great movie , good movie . 4. Poor acting . 5. Great acting , a good movie .
  • 16.
    Sentence No I Loved TheMovie Hated A Great Good Poor Acting Class 1. 1 1 1 1 Pos(+) 2. 1 1 1 1 Neg(- ) 3. 2 1 1 1 Pos(+) 4. 1 1 Neg(- ) 5. 1 1 1 1 Pos(+)
  • 17.
    P(+) = 3 5 P(-) = 2 5 P(word|label)= 𝑛𝑜. 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 𝑡ℎ𝑒 𝑤𝑜𝑟𝑑 𝑜𝑐𝑐𝑢𝑟𝑠 + 1 𝑛𝑜. 𝑜𝑓 𝑤𝑜𝑟𝑑𝑠 𝑡ℎ𝑒𝑟𝑒 𝑡ℎ𝑒𝑟𝑒 𝑎𝑟𝑒 𝑖𝑛 𝑙𝑎𝑏𝑒𝑙 + 𝑛𝑜 𝑜𝑓 𝑘𝑒𝑦𝑤𝑜𝑟𝑑𝑠 P(I|+) = 1+1 14+10 = 2 24 =0.0833 P(loved |+) = 1+1 14+10 = 2 24 =0.0833 P(the|+) = 1+1 14+10 = 2 24 =0.0833 P(movie|+) = 4+1 14+10 = 5 24 =0.2083 P(hated|+) = 0+1 14+10 = 1 24 =0.04166 P(a|+) = 2+1 14+10 = 3 24 =0.0125 P(great|+) = 2+1 14+10 = 3 24 =0.0125 P(good|+) = 2+1 14+10 = 3 24 =0.0125 P(poor|+) = 0+1 14+10 = 1 24 =0.04166 P(acting |+) = 1+1 14+10 = 2 24 =0.0833 P(I|-) = 1+1 6+10 = 2 6 =0.125 P(loved|-) = 0+1 6+10 = 1 6 =0.0625 P(the|-) = 1+1 6+10 = 2 6 =0.125 P(movie|-) = 1+1 6+10 = 2 6 =0.125 P(hated|-) = 1+1 6+10 = 2 6 =0.125 P(poor|-) = 1+1 6+10 = 2 6 =0.125 P(acting|-) = 1+1 6+10 = 2 6 =0.125 P(a|-) = 0+1 6+10 = 1 6 =0.0625 P(great|-) = 0+1 6+10 = 1 6 =0.0625 P(good|-) = 0+1 6+10 = 1 6 =0.0625
  • 18.
    I hated thepoor acting . P( positive )= P(+) P(I|+) P(hated|+) P(the|+) P(poor|+) P(acting |+) = 0.6 * 0.0833 * 0.04166 * 0.0833 * 0.04166 * 0.0833 P( negative )= P(-) P(I|-) P(hated|-) P(the|-) P(poor|-) P(acting |-) = 0.4 * 0.125 * 0.125 * 0.125 * 0.125 * 0.125 = 6.02 X 10 -8 = 1.2207 X 10 -5 P( negative) > P( positive ) RESULT : There is more negativity in the tweet and so we label this tweet as NEGATIVE
  • 19.
    APPLICATIONS • Dissatisfaction orientedonline advertising • On-line commerce  Ex : Brand A or B? Quality X or Y ? Feature C or D ? • Voting advise applications • Clarification of politicians’ positions • Real-world events monitoring  Ex: Leader A or B ? • Legal matters “blawgs”(subset of blogs ) • Policy or government-regulation proposals • Intelligent transportation systems  Ex: Is the movement / law proposals advantageous??
  • 20.
    Using different othermodels and algorithms. Temporal analysis  Data Pre-Processing using more parameters to get best sentiments  accuracy to process human sentiments  Updating Dictionary for new Synonym and Antonyms of already existing words. Web-Application can be converted to Mobile Application Context Sentimental Analysis may be implemented in future for accuracy purposes. FUTURE SCOPE
  • 21.
     “ Whatothers think “ is important.  Sentiment analysis or opinion mining is a field of study that analyzes people’s sentiments, attitudes, or emotions towards certain entities.  Supervised algorithms are still an open field for research.  Naïve Bayes and support vector machines are the most frequently used ML algorithms for solving sc problem.  Micro-blogs, blogs and forums as well as news source, is widely used .  Hence we conclude that Twitter can be the best platform for sentimental analysis
  • 22.
    • https://journalofbigdata.springeropen.com/articles/10.1 186/s40537-015-0015-2 • http://ijiet.com/wp-content/uploads/2016/04/37.pdf •https://github.com/mayank93/Twitter-Sentiment- Analysis • http://www.pythonforbeginners.com/systemsprogrammi ng/using-the-csv-module-in-python/ • http://www.academia.edu/6723240/Mining_Opinion_Fea tures_in_Customer_Reviews • http://content26.com/blog/bing-liu-the-science-of- detecting-fake-reviews/ • http://www.scienceforseo.com • http://help.sentiment140.com/for-students REFERENCES
  • 23.
    • Ronen Feldman,“Techniques and Application of Sentiment Analysis”, Communication of ACM, April 2013, vol. 56.No.4. • http://help.sentiment140.com/for-students • REASEARCH PAPER : Utilization of project sentimental analysis as a project performance predictor by Bob Prieto • REASEARCH PAPER : Sentimental Analysis : Measuring Opinions by Chetashri Bhadane , Hardi Dalal and Heenal Doshi • RESEARCH PAPER : Overview and Future Opportunities of Sentimental Analysis Approaches for Big Data by Nurfadhlina Mohd Sharef, Harnani Mat Zin and Samaneh Nadali