Autor Conducător științific
Universitatea
Politehnica
București
Facultatea de
Automatică și
Calculatoare
Catedra de
Calculatoare
Sentiment-Based Text Segmentation
• Costin-Gabriel Chiru • Ştefan Trăuşan-Matu
Costin-Gabriel CHIRU
Politehnica University of
Bucharest
E-mail:
costin.chiru@cs.pub.ro
Asmelash Teka HADGU
Erasmus Mundus master
Politehnica University of
Bucharest
asmelashtk@gmail.com
Content
• Introduction
• Literature Review
• Proposed Solution
• System Architecture
• Results
• Conclusions
Sentiment-Based Text Segmentation02/26/19 ICSCS 2013
Introduction
• Goal: Help users decide what products to buy
• How?
– Using social knowledge available for those
products.
– And NLP (Text Mining) techniques for detecting
polarity and summarizing opinions regarding
those products or different aspects of those
products.
Sentiment-Based Text Segmentation02/26/19 ICSCS 2013
Other Approaches
• Surveys on opinion mining & sentiment analysis:
– Sentiment Analysis and Subjectivity – Liu, 2010
– Opinion mining and sentiment analysis – Pang and Lee, 2008
• Opinion mining / Sentiment analysis - used to identify the
sentiment orientation of the opinions in a document
• Most application use:
– Ontologies/thesaurus: SentiWordNet, General Inquirer,
– Different annotated corpora,
– Linguistic heuristics or a pre-selected set of seed words,
– Search engines results (Turney, 2002).
to learn specific features that can be used to classify other texts.
• Text segmentation - intensely treated, starting with Allan et. al., 1998
– BUT not text segmentation according to sentiments.
Sentiment-Based Text Segmentation02/26/19 ICSCS 2013
Proposed Solution (I)
• Our solution for sentiments-based text
segmentation in the context of product
reviews :
– The identification of product features
– The extraction of opinions associated
with these features;
– Sentiment polarity classification
Sentiment-Based Text Segmentation
Identification and Extraction of
Opinion Words
Identification and Extraction of
Opinion Words
POS
Tagging
POS
Tagging HeuristicsHeuristics
Product Features Opinion words
Sentiment polarity ClassificationSentiment polarity Classification
Sentiment
Lexicon
Sentiment
Lexicon
Assign
Polarity
Assign
Polarity
Segmentation and VisualizationSegmentation and Visualization
Text
Segments
Text
Segments VisualizationVisualization
02/26/19 ICSCS 2013
Proposed Solution (II)
• The identification of product
features
– Identify the nouns and noun
phrases from the reviews using
POS tagging  possible product
features
– Use TFIDF technique to most
frequent ones  probable
product features
– Use WordNet to exploit the
relationships between synsets
• We have built the word-cloud for
the most important terms
extracted from reviews for digital
cameras
(http://www.photographyreview.
com).
Sentiment-Based Text Segmentation02/26/19 ICSCS 2013
Proposed Solution (III)
• The extraction of opinions associated with the extracted
features
– We extracted the adjectives that appear close to the words
depicting the product features
– Deeper analysis can use parse information and manually or semi-
automatically developed rules or sentiment-relevant lexicons.
• Sentiment polarity classification
– Once the pairs product features – reviewers’ opinion are known,
we can evaluate the polarity of the sentiments expressed by these
opinions
– Once each opinion is tagged, we use the majority values (positive
or negative) to decide whether that feature has a positive impact
on the reviewers or a negative one
Sentiment-Based Text Segmentation02/26/19 ICSCS 2013
System Architecture
• 4 steps:
– POS Tagging  adjectives / BOW (bag-of-words) + dictionary of
sentiment words
– Opinion words extraction
– Sentiments assessment  SentiWordNet / lexicon designed by Hu
and Liu, 2004 enriched with domain specific words (using TFIDF,
POS tagging and manual annotation)
– Segmentation  put segmentation markers (||) when the polarity
shifts
Sentiment-Based Text Segmentation02/26/19 ICSCS 2013
Get Text
(reviews)
Get Text
(reviews)
POS
Tagging
POS
Tagging
BOW
approach
BOW
approach
Identify the
Sentiment
Words
Identify the
Sentiment
Words
Assign
Polarity
Assign
Polarity
Text
segmentation
Text
segmentation
Sentiment
Words
Sentiment
Words
Results
Sentiment-Based Text Segmentation02/26/19 ICSCS 2013
• Test text: This is a great camera. Though the pictures can get a bit
blurred at times, it's awesome for the price.
• BOW method results (three sentiment words: great, blurred and
awesome, 2 of them being positive, while the third one being
negative):
– This is a great camera. Though the pictures can get a bit || blurred || at
times, it's awesome for the price.
• POS tagging method results:
– POS tagging: This/DT is/VBZ a/DT great/JJ camera/NN ./. Though/IN
the/DT pictures/NNS can/MD get/VB a/DT bit/NN blurred/VBD at/IN
times/NNS ,/, it/PRP 's/VBZ awesome/JJ for/IN the/DT price/NN ./.
– The adjectives are identified (great and awesome) and their valences are
evaluated according to SentiWordNet: “great” is considered to be
objective and “awesome” is considered to be positive  the whole
phrase is categorized as being positive because no polarity shifts have
been determined.
Improving Results (I)
Sentiment-Based Text Segmentation02/26/19 ICSCS 2013
• Improving the sentiment words recognition:
– POS tagging method: use the average valence of a
given word instead of simply considering its first
sense  still not powerful enough 
– Combine the two methods by building an extended
list comprising of the words from the sentiment
words dictionary, along with the adjectives from the
SentiWordNet.  if still not powerful enough 
– Enhance this list with the words having other POS
than the ones already considered (for example
adverbs and verbs).
• Improving segmentation:
– Use Stanford Parser to place the boundaries in the natural places and not where
the shifts are detected go up from the sentiments words until reaching the first
conflict and classify each sub-tree according to the expressed sentiment.
Improving Results
Sentiment-Based Text Segmentation02/26/19 ICSCS 2013
S
SBAR(IN Though) , NP
(NP (DT
the) (NNS
pictures))
(VP (MD can) (VP (VB
get) (SBAR (S (NP (DT
a) (NN bit)) (VP (VBD
blurred) (PP (IN at) (NP
(NNS times))))))))
PRP it
VP
(VBZ 's) (ADJP (JJ
awesome) (PP (IN
for) (NP (DT the)
(NN price)))))
.
Root
S
The final segmentation would be:
This is a great camera. || Though the pictures can get a bit
blurred at times ||, it's awesome for the price.
(ROOT
(S
(NP (DT This))
(VP (VBZ is)
(NP (DT a) (JJ great) (NN
camera)))
(. .)))
(ROOT
(S
(SBAR (IN Though)
(S
(NP (DT the) (NNS pictures))
(VP (MD can)
(VP (VB get)
(SBAR
(S
(NP (DT a) (NN bit))
(VP (VBD blurred)
(PP (IN at)
(NP (NNS times))))))))))
(, ,)
(NP (PRP it))
(VP (VBZ 's)
(ADJP (JJ awesome)
(PP (IN for)
(NP (DT the) (NN price)))))
(. .)))
Conclusions
Sentiment-Based Text Segmentation02/26/19 ICSCS 2013
• We implemented two approaches for sentiment-based
text segmentation:
– One based on the POS tagging and some heuristics for
identifying the sentiment words’ valence using
SentiWordNet.
– One based on the bag-of-words approach and a sentiment
words dictionary provided by Hu and Liu.
• Since the results were not satisfactory, we thought of
methods of improving our results:
– Combining the two methods, or
– Using different existing resources (such as ANEW), or
– Including the words with other POS tags in our analysis, and
– Using phrases parse trees for better segmenting the text.
Questions
Thank you very much!
Sentiment-Based Text Segmentation02/26/19 ICSCS 2013

Sentiment based text segmentation

  • 1.
    Autor Conducător științific Universitatea Politehnica București Facultateade Automatică și Calculatoare Catedra de Calculatoare Sentiment-Based Text Segmentation • Costin-Gabriel Chiru • Ştefan Trăuşan-Matu Costin-Gabriel CHIRU Politehnica University of Bucharest E-mail: costin.chiru@cs.pub.ro Asmelash Teka HADGU Erasmus Mundus master Politehnica University of Bucharest asmelashtk@gmail.com
  • 2.
    Content • Introduction • LiteratureReview • Proposed Solution • System Architecture • Results • Conclusions Sentiment-Based Text Segmentation02/26/19 ICSCS 2013
  • 3.
    Introduction • Goal: Helpusers decide what products to buy • How? – Using social knowledge available for those products. – And NLP (Text Mining) techniques for detecting polarity and summarizing opinions regarding those products or different aspects of those products. Sentiment-Based Text Segmentation02/26/19 ICSCS 2013
  • 4.
    Other Approaches • Surveyson opinion mining & sentiment analysis: – Sentiment Analysis and Subjectivity – Liu, 2010 – Opinion mining and sentiment analysis – Pang and Lee, 2008 • Opinion mining / Sentiment analysis - used to identify the sentiment orientation of the opinions in a document • Most application use: – Ontologies/thesaurus: SentiWordNet, General Inquirer, – Different annotated corpora, – Linguistic heuristics or a pre-selected set of seed words, – Search engines results (Turney, 2002). to learn specific features that can be used to classify other texts. • Text segmentation - intensely treated, starting with Allan et. al., 1998 – BUT not text segmentation according to sentiments. Sentiment-Based Text Segmentation02/26/19 ICSCS 2013
  • 5.
    Proposed Solution (I) •Our solution for sentiments-based text segmentation in the context of product reviews : – The identification of product features – The extraction of opinions associated with these features; – Sentiment polarity classification Sentiment-Based Text Segmentation Identification and Extraction of Opinion Words Identification and Extraction of Opinion Words POS Tagging POS Tagging HeuristicsHeuristics Product Features Opinion words Sentiment polarity ClassificationSentiment polarity Classification Sentiment Lexicon Sentiment Lexicon Assign Polarity Assign Polarity Segmentation and VisualizationSegmentation and Visualization Text Segments Text Segments VisualizationVisualization 02/26/19 ICSCS 2013
  • 6.
    Proposed Solution (II) •The identification of product features – Identify the nouns and noun phrases from the reviews using POS tagging  possible product features – Use TFIDF technique to most frequent ones  probable product features – Use WordNet to exploit the relationships between synsets • We have built the word-cloud for the most important terms extracted from reviews for digital cameras (http://www.photographyreview. com). Sentiment-Based Text Segmentation02/26/19 ICSCS 2013
  • 7.
    Proposed Solution (III) •The extraction of opinions associated with the extracted features – We extracted the adjectives that appear close to the words depicting the product features – Deeper analysis can use parse information and manually or semi- automatically developed rules or sentiment-relevant lexicons. • Sentiment polarity classification – Once the pairs product features – reviewers’ opinion are known, we can evaluate the polarity of the sentiments expressed by these opinions – Once each opinion is tagged, we use the majority values (positive or negative) to decide whether that feature has a positive impact on the reviewers or a negative one Sentiment-Based Text Segmentation02/26/19 ICSCS 2013
  • 8.
    System Architecture • 4steps: – POS Tagging  adjectives / BOW (bag-of-words) + dictionary of sentiment words – Opinion words extraction – Sentiments assessment  SentiWordNet / lexicon designed by Hu and Liu, 2004 enriched with domain specific words (using TFIDF, POS tagging and manual annotation) – Segmentation  put segmentation markers (||) when the polarity shifts Sentiment-Based Text Segmentation02/26/19 ICSCS 2013 Get Text (reviews) Get Text (reviews) POS Tagging POS Tagging BOW approach BOW approach Identify the Sentiment Words Identify the Sentiment Words Assign Polarity Assign Polarity Text segmentation Text segmentation Sentiment Words Sentiment Words
  • 9.
    Results Sentiment-Based Text Segmentation02/26/19ICSCS 2013 • Test text: This is a great camera. Though the pictures can get a bit blurred at times, it's awesome for the price. • BOW method results (three sentiment words: great, blurred and awesome, 2 of them being positive, while the third one being negative): – This is a great camera. Though the pictures can get a bit || blurred || at times, it's awesome for the price. • POS tagging method results: – POS tagging: This/DT is/VBZ a/DT great/JJ camera/NN ./. Though/IN the/DT pictures/NNS can/MD get/VB a/DT bit/NN blurred/VBD at/IN times/NNS ,/, it/PRP 's/VBZ awesome/JJ for/IN the/DT price/NN ./. – The adjectives are identified (great and awesome) and their valences are evaluated according to SentiWordNet: “great” is considered to be objective and “awesome” is considered to be positive  the whole phrase is categorized as being positive because no polarity shifts have been determined.
  • 10.
    Improving Results (I) Sentiment-BasedText Segmentation02/26/19 ICSCS 2013 • Improving the sentiment words recognition: – POS tagging method: use the average valence of a given word instead of simply considering its first sense  still not powerful enough  – Combine the two methods by building an extended list comprising of the words from the sentiment words dictionary, along with the adjectives from the SentiWordNet.  if still not powerful enough  – Enhance this list with the words having other POS than the ones already considered (for example adverbs and verbs).
  • 11.
    • Improving segmentation: –Use Stanford Parser to place the boundaries in the natural places and not where the shifts are detected go up from the sentiments words until reaching the first conflict and classify each sub-tree according to the expressed sentiment. Improving Results Sentiment-Based Text Segmentation02/26/19 ICSCS 2013 S SBAR(IN Though) , NP (NP (DT the) (NNS pictures)) (VP (MD can) (VP (VB get) (SBAR (S (NP (DT a) (NN bit)) (VP (VBD blurred) (PP (IN at) (NP (NNS times)))))))) PRP it VP (VBZ 's) (ADJP (JJ awesome) (PP (IN for) (NP (DT the) (NN price))))) . Root S The final segmentation would be: This is a great camera. || Though the pictures can get a bit blurred at times ||, it's awesome for the price. (ROOT (S (NP (DT This)) (VP (VBZ is) (NP (DT a) (JJ great) (NN camera))) (. .))) (ROOT (S (SBAR (IN Though) (S (NP (DT the) (NNS pictures)) (VP (MD can) (VP (VB get) (SBAR (S (NP (DT a) (NN bit)) (VP (VBD blurred) (PP (IN at) (NP (NNS times)))))))))) (, ,) (NP (PRP it)) (VP (VBZ 's) (ADJP (JJ awesome) (PP (IN for) (NP (DT the) (NN price))))) (. .)))
  • 12.
    Conclusions Sentiment-Based Text Segmentation02/26/19ICSCS 2013 • We implemented two approaches for sentiment-based text segmentation: – One based on the POS tagging and some heuristics for identifying the sentiment words’ valence using SentiWordNet. – One based on the bag-of-words approach and a sentiment words dictionary provided by Hu and Liu. • Since the results were not satisfactory, we thought of methods of improving our results: – Combining the two methods, or – Using different existing resources (such as ANEW), or – Including the words with other POS tags in our analysis, and – Using phrases parse trees for better segmenting the text.
  • 13.
    Questions Thank you verymuch! Sentiment-Based Text Segmentation02/26/19 ICSCS 2013