Text Classification & Sentiment Analysis
Muhammad Atif Qureshi
Arjumand Younus
2
Contents
●
An Introduction to Text Classification
– Text Classification Examples
– Text Classification Methods
● Naive Bayes
– Formalization
– Learning
●
Applications of Sentiment Analysis
●
Baseline Algorithm for Sentiment Analysis
● Sentiment Lexicons
● Sentiment Analysis for the Political Domain (Personal Research)
3
Text Classification Examples
● News filtering and organization
● Document organization and retrieval
● Sentiment analysis/Opinion mining
● Email classification and spam filtering
● Authorship attribution
4
Spam Classification Example
Slide borrowed from Coursera Lectures on “Natural Language Processing
By Prof. Dan Jurafsky
5
Text Classification
● Set of training documents D = {d1,....,dN} such that each
record is labeled with a class value 'c' from C = {c1,....,cJ}
● Features in training data are related to labels by means of
classification model
● Classification model helps predict label for an unknown
(test) record
● With text classification, model uses text-based features
6
Text Classification Methods
● Hand-coded rules
● Supervised machine learning
– Naive bayes
– Logistic regression
– Support vector machines
– K-nearest neighbors
7
Naive Bayes
● Simple (“naive”) classification method based on Bayes rule
● Relies on simple document representation namely bag of
words
I love this movie. It's sweet but with satirical humor. The
dialogue Is great and the adventure scenes are great
fun...It manages to be whimsical and romantic while
laughing at the conventions of the fairy tale genre. I
would recommend it to just about anyone. I've seen it
several times as I love it so much, and I'm always
happy to see it again whenever I have a friend who
hasn't seen it yet.
8
Bag of Words Representation:
Subset of Words
I love this movie. It's sweet but with satirical humor. The
dialogue is great and the adventure scenes are great
fun...It manages to be whimsical and romantic while
laughing at the conventions of the fairy tale genre. I
would recommend it to just about anyone. I've seen it
several times as I love it so much, and I'm always
happy to see it again whenever I have a friend who
hasn't seen it yet.
great 2
love 2
recommend 1
laugh 1
happy 1
..... ....
9
Bayes' Rule Applied to Documents
and Classes
● For a document d and a class c
P(d/c)P(c)
P(d)
P(c/d) =
10
Naive Bayes Classifier (1/3)
CMAP argmax P(c/d)=
c∈C
argmax
P(d/c)P(c)
P(d)
=
c∈C
argmax P(d/c)P(c)
c∈C
=
11
Naive Bayes Classifier (2/3)
CMAP =
=
argmax P(d/c)P(c)
c∈C
argmax P(x 1, x2,..,xn/c)P(c)
c∈C
Document represented as
features x1....xn
How often does this class
occur?
We can just count the relative
frequencies in a corpus.
12
Naive Bayes Classifier (3/3)
CMAP =
=
argmax P(d/c)P(c)
c∈C
argmax P(x 1, x2,..,xn/c)P(c)
c∈C
O(|Xn
|.|C|) parameters
Could only be estimated if a very,
very large number of training examples
was available.
argmax P(x 1, x2,..,xn/c)P(c)
13
Multinomial Naive Bayes
Independence Assumptions
Bag of Words assumption: Assume position doesn't
matter
● Conditional Independence: Assume the feature
probabilities P(xi/cj) are independent given the class c.
P(x 1,x 2,..,xn/c)
P(x1,x2,..,xn/c)=P(x1/c)x.....P(xn/c)
14
Multinomial Naive Bayes Classifier
positions ← all word positions in test document
cNB
=
cj∈C
argmax P(cj) ∏
i∈positions
P(xi/cj)
15
Multinomial Naive Bayes Classifier
CMAP = argmax P(x 1, x2,..,xn/c)P(c)
c∈C
argmax P(cj)∏
x ∈X
P(x/c)
c∈C
cNB
=
16
Learning the Multinomial Naive
Bayes Model
● First attempt: maximum likelihood estimates
– simply use frequencies in the data
17
Parameter Estimation
● Create mega-document for topic j by concatenating all
docs in this topic
– Use frequency of w in mega-document
18
Problem with Maximum Likelihood
● What if we have seen no training documents with the word
fantastic and classified as positive
● Zero probabilities cannot be conditioned away, no matter
the other evidence!
19
Laplace (add-1) Smoothing for
Naive Bayes
20
Multinomial Naive Bayes: Learning
● From training corpus, extract Vocabulary
21
Multinomial Naive Bayes: A
Worked Example
22
Sentiment Analysis Overview
23
Sentiment Analysis Applications
(1/4)
● Movie: is this review positive or negative?
● Products: what do people think about the new iPhone?
● Public sentiment: how is consumer confidence? Is despair
increasing?
● Politics: what do people think about this candidate or
issue?
● Prediction: predict election outcomes or market trends
from sentiment
24
Sentiment Analysis Applications
(2/4)
25
Sentiment Analysis Applications
(3/4)
26
Sentiment Analysis Applications
(4/4)
27
Formal Definition of Sentiment
Analysis
● Sentiment analysis is the detection of attitudes
“enduring, affectively colored beliefs, dispositions towards objects or persons”
1. Holder (source) of attitude
2. Target (aspect) of attitude
3. Type of attitude
➢ From a set of types
• like, love, hate, value, desire, etc.
➢ Or (more commonly) simple weighted polarity:
• positive, negative, neutral together with strength
4. Text containing the attitude
➢ Sentence or entire document
28
Sentiment Analysis Tasks
● Simplest:
– Is the attitude of this text positive or negative?
● More complex:
– Rank the attitude of this text from 1 to 5
● Advanced:
– Detect the target, source, or complex attitude types
29
Sentiment Analysis: A Baseline
Algorithm
● Polarity detection in movie reviews:
– Is an IMDB movie review positive or negative?
● Data: Polarity Data 2.0:
– http://www.cs.cornell.edu/people/pabo/movie-review-dat
a/
30
Baseline Algorithm (adapted from
Pang and Lee)
● Tokenization
● Feature Extraction
● Classification using different classifiers
– Naive Bayes
– MaxEnt
– SVM
31
Sentiment Tokenization Issues
● Deal with HTML and XML markup
● Twitter markup (names, hash tags)
● Capitalization (preserve for words in all caps)
● Phone numbers, dates
● Emoticons
32
Extracting Features for Sentiment
Classification
● How to handle negation
– I didn't like this movie
vs
– I really like this movie
● Which words to use?
– Only adjectives
– All words
33
Negation
● Add NOT_ to every word between negation and following
punctuation:
Didn't like this movie, but I
Didn't NOT_like NOT_this NOT_movie but I
34
Reminder: Naive Bayes
35
Sentiment Lexicons
● Dictionary of well-known “sentiment” words
– Abusive terms
– Adjectives like bad, worse, good, better, ugly, pretty
● Available for use in research
– LIWC: Linguistic Inquiry and Word Count
– SentiStrength
– Bing Liu's Opinion Lexicon
36
My Research: Election Trolling on
Twitter (Pakistan Elections 2013)
Twitterer Tweet
A @B Yeh...#Shame with fake account, this is how
PTIians think they will get votes
B @A Stop making a fuss and fuck off.
A @B A dumb leader like IK can produce followers
like you.
B @A A corrupt leader like Noora can hire paid trolls
like you

Text classification & sentiment analysis

  • 1.
    Text Classification &Sentiment Analysis Muhammad Atif Qureshi Arjumand Younus
  • 2.
    2 Contents ● An Introduction toText Classification – Text Classification Examples – Text Classification Methods ● Naive Bayes – Formalization – Learning ● Applications of Sentiment Analysis ● Baseline Algorithm for Sentiment Analysis ● Sentiment Lexicons ● Sentiment Analysis for the Political Domain (Personal Research)
  • 3.
    3 Text Classification Examples ●News filtering and organization ● Document organization and retrieval ● Sentiment analysis/Opinion mining ● Email classification and spam filtering ● Authorship attribution
  • 4.
    4 Spam Classification Example Slideborrowed from Coursera Lectures on “Natural Language Processing By Prof. Dan Jurafsky
  • 5.
    5 Text Classification ● Setof training documents D = {d1,....,dN} such that each record is labeled with a class value 'c' from C = {c1,....,cJ} ● Features in training data are related to labels by means of classification model ● Classification model helps predict label for an unknown (test) record ● With text classification, model uses text-based features
  • 6.
    6 Text Classification Methods ●Hand-coded rules ● Supervised machine learning – Naive bayes – Logistic regression – Support vector machines – K-nearest neighbors
  • 7.
    7 Naive Bayes ● Simple(“naive”) classification method based on Bayes rule ● Relies on simple document representation namely bag of words I love this movie. It's sweet but with satirical humor. The dialogue Is great and the adventure scenes are great fun...It manages to be whimsical and romantic while laughing at the conventions of the fairy tale genre. I would recommend it to just about anyone. I've seen it several times as I love it so much, and I'm always happy to see it again whenever I have a friend who hasn't seen it yet.
  • 8.
    8 Bag of WordsRepresentation: Subset of Words I love this movie. It's sweet but with satirical humor. The dialogue is great and the adventure scenes are great fun...It manages to be whimsical and romantic while laughing at the conventions of the fairy tale genre. I would recommend it to just about anyone. I've seen it several times as I love it so much, and I'm always happy to see it again whenever I have a friend who hasn't seen it yet. great 2 love 2 recommend 1 laugh 1 happy 1 ..... ....
  • 9.
    9 Bayes' Rule Appliedto Documents and Classes ● For a document d and a class c P(d/c)P(c) P(d) P(c/d) =
  • 10.
    10 Naive Bayes Classifier(1/3) CMAP argmax P(c/d)= c∈C argmax P(d/c)P(c) P(d) = c∈C argmax P(d/c)P(c) c∈C =
  • 11.
    11 Naive Bayes Classifier(2/3) CMAP = = argmax P(d/c)P(c) c∈C argmax P(x 1, x2,..,xn/c)P(c) c∈C Document represented as features x1....xn How often does this class occur? We can just count the relative frequencies in a corpus.
  • 12.
    12 Naive Bayes Classifier(3/3) CMAP = = argmax P(d/c)P(c) c∈C argmax P(x 1, x2,..,xn/c)P(c) c∈C O(|Xn |.|C|) parameters Could only be estimated if a very, very large number of training examples was available. argmax P(x 1, x2,..,xn/c)P(c)
  • 13.
    13 Multinomial Naive Bayes IndependenceAssumptions Bag of Words assumption: Assume position doesn't matter ● Conditional Independence: Assume the feature probabilities P(xi/cj) are independent given the class c. P(x 1,x 2,..,xn/c) P(x1,x2,..,xn/c)=P(x1/c)x.....P(xn/c)
  • 14.
    14 Multinomial Naive BayesClassifier positions ← all word positions in test document cNB = cj∈C argmax P(cj) ∏ i∈positions P(xi/cj)
  • 15.
    15 Multinomial Naive BayesClassifier CMAP = argmax P(x 1, x2,..,xn/c)P(c) c∈C argmax P(cj)∏ x ∈X P(x/c) c∈C cNB =
  • 16.
    16 Learning the MultinomialNaive Bayes Model ● First attempt: maximum likelihood estimates – simply use frequencies in the data
  • 17.
    17 Parameter Estimation ● Createmega-document for topic j by concatenating all docs in this topic – Use frequency of w in mega-document
  • 18.
    18 Problem with MaximumLikelihood ● What if we have seen no training documents with the word fantastic and classified as positive ● Zero probabilities cannot be conditioned away, no matter the other evidence!
  • 19.
  • 20.
    20 Multinomial Naive Bayes:Learning ● From training corpus, extract Vocabulary
  • 21.
  • 22.
  • 23.
    23 Sentiment Analysis Applications (1/4) ●Movie: is this review positive or negative? ● Products: what do people think about the new iPhone? ● Public sentiment: how is consumer confidence? Is despair increasing? ● Politics: what do people think about this candidate or issue? ● Prediction: predict election outcomes or market trends from sentiment
  • 24.
  • 25.
  • 26.
  • 27.
    27 Formal Definition ofSentiment Analysis ● Sentiment analysis is the detection of attitudes “enduring, affectively colored beliefs, dispositions towards objects or persons” 1. Holder (source) of attitude 2. Target (aspect) of attitude 3. Type of attitude ➢ From a set of types • like, love, hate, value, desire, etc. ➢ Or (more commonly) simple weighted polarity: • positive, negative, neutral together with strength 4. Text containing the attitude ➢ Sentence or entire document
  • 28.
    28 Sentiment Analysis Tasks ●Simplest: – Is the attitude of this text positive or negative? ● More complex: – Rank the attitude of this text from 1 to 5 ● Advanced: – Detect the target, source, or complex attitude types
  • 29.
    29 Sentiment Analysis: ABaseline Algorithm ● Polarity detection in movie reviews: – Is an IMDB movie review positive or negative? ● Data: Polarity Data 2.0: – http://www.cs.cornell.edu/people/pabo/movie-review-dat a/
  • 30.
    30 Baseline Algorithm (adaptedfrom Pang and Lee) ● Tokenization ● Feature Extraction ● Classification using different classifiers – Naive Bayes – MaxEnt – SVM
  • 31.
    31 Sentiment Tokenization Issues ●Deal with HTML and XML markup ● Twitter markup (names, hash tags) ● Capitalization (preserve for words in all caps) ● Phone numbers, dates ● Emoticons
  • 32.
    32 Extracting Features forSentiment Classification ● How to handle negation – I didn't like this movie vs – I really like this movie ● Which words to use? – Only adjectives – All words
  • 33.
    33 Negation ● Add NOT_to every word between negation and following punctuation: Didn't like this movie, but I Didn't NOT_like NOT_this NOT_movie but I
  • 34.
  • 35.
    35 Sentiment Lexicons ● Dictionaryof well-known “sentiment” words – Abusive terms – Adjectives like bad, worse, good, better, ugly, pretty ● Available for use in research – LIWC: Linguistic Inquiry and Word Count – SentiStrength – Bing Liu's Opinion Lexicon
  • 36.
    36 My Research: ElectionTrolling on Twitter (Pakistan Elections 2013) Twitterer Tweet A @B Yeh...#Shame with fake account, this is how PTIians think they will get votes B @A Stop making a fuss and fuck off. A @B A dumb leader like IK can produce followers like you. B @A A corrupt leader like Noora can hire paid trolls like you