Sentiment Analysis of Social Media Content: A multi-tool for listening to your audience and developing sentimental content strategies.

Machine Learning for Big Data
Prof. Dr. Eirini Ntoutsi
Leibniz University Hannover & L3S Research Center
Sentiment Analysis of Social Media Content
A multi-tool for listening to your audience and
developing sentimental content strategies
EUMade4All Workshop, Hannover, 29.9.2017

Outline
 A world of opinions
 Analyzing opinions for sentiment
 Using sentimental content
2Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content

A Web/World of opinions
 With the advent of Web 2.0 and its social character a lot of opinion-rich
resources have arise
Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content 3

Opinions

Opinions vs Facts
 Facts
 Screen: 4.7in LCD 1334x750 (326ppi)
 Processor: Apple A11 Bionic
 RAM: 2GB of RAM
 Storage: 64/256GB
 Operating system: iOS 11
 Camera: 12MP rear camera, 7MP front-facing camera
 Connectivity: LTE, Wi-Fiac, NFC, Bluetooth 5, Lightning
and GPS
 Dimensions: 138.4 x 67.3 x 7.3mm
 Weight: 148g
 Opinions

Opinions on everything

Why we care?
 Opinions are produced at a constant basis and are (most of
the times) freely available
 Free feedback from our customers/ users 
 Valuable source of information for companies, politicians1,
decision makers
 Companies turn into social media monitoring in order to
optimize and strengthen their products and brands
 An opportunity for marketers to pay attention to
consumers’ feelings towards their brand
 People have the power to influence each other in their
decisions
 Product design could be driven by user requests
1https://motherboard.vice.com/en_us/article/mg9vvn/how-our-likes-helped-trump-win

Sentiment analysis
 Opinions on Vodafone
Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
 What we are interested in?
 (Automatically) Identifying the negative tweets (and reacting … customer care)
8

Aspect-oriented sentiment analysis
 It‘s not ALL good or bad
 Reviews from TripAdvisor on Vienna Marriott Hotel
2/5/2014: Great hotel, very nice rooms, perfect location, very nice staff except for a mid-aged female receptionist who tried to
charge me extra for wifi fees when checking out. It was waived at the desk when I checked-in. And she started treating me with
an attitude after she found out that I got a great deal through priceline.com. ….
26/1/2014: Spent a long weekend here. Rooms clean and functional without being spectacular and a nice pool etc. Staff in pool
weren't Good and I found them actually quite rood. Executive lounge was ok and not busy but selection of wine and beer wasn't
great. The reception has many shops and a bar at the end which kind of males it feel like a shopping centre. Overall great for
business travel but not sure id come again for leisure.
7/5/2013: The Vienna Marriott has all you expect; no frills, but solid service and they get all the basic stuff done right.
It's in a fine location, maybe 10 minute walk from the major city attractions while being in a quiet area. Breakfast buffet
exceptional and good fitness center. Very helpful and happy staff.
Lobby lounge just okay. Not a good wine selection and the Sinatra-like singer adds nothing.
Maybe just a little more expensive than it should be, too.
 What we are interested in?
 What people are talking about (items and item aspects)
 The attitude of people towards these items and aspects
9

(Sentiment- & aspect-based) opinion summarization

Sentiment analysis: an umbrella term
 The Sentiment Analysis task
 Is a given text positive, negative, or neutral?
 Text = a sentence, a tweet, a customer review, a document …
 The Emotion Analysis task
 What emotion is being expressed in a given piece of text?
 Basic emotions: joy, sadness, fear, anger,…
 Other emotions: guilt, pride, optimism, frustration,…
 The Aspect-oriented Sentiment Analysis task
 What are the product/entity aspects discussed in a text?
 What is the sentiment of those aspects?
 The Summarization task
 What are the key aspects in users’ opinions? What is the predominant
sentiment?

Outline

Building a sentiment classifier
 Building a sentiment classifier requires data and algorithms
Algorithm
Model
f(x)

Challenges of sentiment analysis in social media
 Language-related & medium-related challenges
 Informal
 Short, 140 characters for tweets
 Abbreviations and shortenings
 Wide array of topics and large vocabulary
 Spelling mistakes and creative spellings
 Special strings like hashtags, emoticons, conjoined words
 Data properties
 Large amounts of opinions (Volume)
 Continuous flow of opinions (Velocity)

Challenges of sentiment analysis in social media
 Sentiment-related challenges
 The unambiguous identification of sentiment
 Sarcasm
 Bipolarity
 Dealing with colloquial language
 tweets containing colloquial slang

Building a sentiment classifier
 Two challenging parts
 Learning: How to build a classifier?
 Labeling: How to create a (class-labeled) training set?
Algorithm
Model
f(x)

How to build a classifier
Preprocessing part
Negations
Colloquial language
Superfluous words
Emoticons
Learning part/ Classifiers
Naïve Bayes
SVMs
Ensembles
Deep Neural Networks
KNNs
…

Preprocessing - Negations
 Tagging negations with verbs
 27.222.287 found verb negations (0.4%)
 Tagging negations with adjectives
 2-part adjective co-occurrences
 3-part adjective co-occurrences
 4.832.573 found adjective negations (0.1%)
I do not like  I NOT_like
It didn't fit  It NOT_fit
not pretty  ugly
not bad  good
not very young  old
Verbs negation list: www.vocabulix.com
Adverbs negation list: www.scribd.com
85%
15%
Negation verbs Negation adjectives
Iosifidis & Ntoutsi, “Large scale sentiment learning with limited labels”, KDD 2017

Preprocessing effect – Overall view (distinct words)
0
50.000.000
100.000.000
150.000.000
200.000.000
250.000.000
300.000.000
original slang links & mentions negations Emoticons Stopwords
Iosifidis & Ntoutsi, “Large scale sentiment learning with limited labels”, KDD 2017

(back to) Building a sentiment classifier
 Two challenging parts
 Learning: How to build a classifier?
 Labeling: How to create a (class-labeled) training set?
Algorithm
Model
f(x)

How to create a (class-labeled) training set
 Big Data but few labels
 Human labelling at this scale is impossible
 What other (machine-based) resources can we exploit to label (part of)
our data?
 At the data level
 Labels through emoticons
 Labels through sentiment dictionaries (like SentiWordNet)
 At the machine learning model level
 use both labeled and unlabeled data for learning  semi-supervised learning

Labels through emoticons
 Implicit labels, through emoticons
 We assembled a list of positive, negative emoticons
 #72 positive class emoticons :-) :) :o) =) ;) (: (; (= <3 :D :-D :oD =D ;D
 #70 negative emoticons :( :-( :o( =( ;( ;-( ): ); )=
 We classified tweets based on their emoticons
 Positive  only positive emoticons (10%)
 Negative  only negative emoticons (2%)
 Mixed  both positive and negative (1%)
 No emoticon (88%)
 In total, 57.340.286 (12%) are pure-labeled.
10%
88%
2% 0%
emoticons_positive no_emoticons
emoticons_negative emoticons_mixed

Labels through SentiWordNet
 SentiWordNet: a lexical resource for supporting sentiment classification
 Tweet sentiment as an aggregation of the sentiment of its member words
 SentiWordNet labeling results
 Positive: only positive words
 Negative: only negative words
 Neutral: only neutral words
 Zero-sum: mix of positive and negative
 No decision: words do not exist in the lexicon
 e.g., #Iloveobama, #refugeecrisis etc

Emoticons vs SentiWordNet
 For the intersection (57.340.286 = 12% tweets with pure sentiment-based labels),
we checked agreement in the labels
 Causes of disagreement
 Emoticons-based labeling
 Prone to errors: existence of positive emoticons does not imply positive words
 SentiWordNet-based labeling
 SentiWordNet is a static dictionary
 Twitter is very dynamic
 Words change polarity (also based on context)
 New words are created (e.g. hashtags) which are not part of the dictionary
Emoticon-based
labeling
SentiWordNet-based labeling
Positive Negative Neutral Zero sum No-decision
Positive 28.104.677
(49%)
10.756.225
(19%)
4.908.237
(9%)
23.297
(0.04%)
3.140.978
(5%)
Negative 4.929.947
(9%)
3.885.983
(7%)
930.075
(2%)
7.527
(0.01%)
653.340
(1%)
• We need a hybrid approach:
Campero et al, “Tracking Ephemeral Sentiment
Entities in Social Streams”, submitted 2017

Challenges and opportunities
 Multilinguality
 486.627.464 (English tweets) out of 1.882.387.310 total tweets  we utilize
only 26% of the dataset.
 Add multilingual content
 Transfer learning
 Exploit the content similarity
 Not everyone uses emoticons
 If tweets are similar, “inherit” the sentiment from the “neighboring” tweets
 Exploit the hashtags
 Start with a seed of positive, negative hashtags
 Data augmentation
 Iosifidis & Ntoutsi, “Data Augmentation for Polarized Textual Data for Dealing with Class
Imbalance”, Submitted 2017

Challenges and opportunities
 Dealing with class imbalance
 Most of the opinions/ reviews are positive (5*, respectively). How can we build
models that learn best all classes (not just the majority)?
 Dealing with changes
 How sentiment changes with time? How can we build classifiers that react to
change (concept drifts)?

Reacting to change
Part of our ongoing work on the OSCAR project
DFG project OSCAR: “Opinion Stream
Classification with Ensembles and Active
leaRners”

Outline

Changing perspectives: Serving emotional content
"At the constitutional level where we work, 90% of any decision is emotional.
The rational part of us supplies the reasons for supporting our predilections.”
----Justice William O. Douglas
31

Rational appeal
 List benefits

Emotional appeals
 You will be happier, smarter or better looking if you have this item.

The cultural challenge
 A case study of FIAT
 FIAT released an ad in Italy in which actor
Richard Gere drives a Lancia Delta from
Hollywood to Tibet.
 Gere is hated in China for being an
outspoken supporter of the Dalai Lama
 There was a huge online uproar on
Chinese message boards commenting that
they would never buy a FIAT car.
34

The ephemeral sentiment challenge
 Sentiment trajectory for refugees topic
35
Source: Multilingual Sentiment Analysis on Data of the Refugee Crisis in Europe, Shalunts and Backfried, Data Analytics 2016

To summarize
 Opinions convey more than just information
 They comprise a great (and free, most of the times) resource for getting to
know your audience students
 You can use opinionated words/ emotions to connect to your audience
students
 Many tools for sentiment analysis exist out there (some for free, but also
professional ones)
 From an ML point of view
 A challenging problem due to language, lack of labeled data, noisy data,
change and context

Thank you! Questions/ Thoughts?

Contact
Prof Dr. Eirini Ntoutsi
FG Intelligent Systems
Faculty of Electrical Engineering and Computer Science
Leibniz University Hannover & L3S Research Center
http://www.kbs.uni-hannover.de/~ntoutsi/
ntoutsi@l3s.de

Sentiment Analysis of Social Media Content: A multi-tool for listening to your audience and developing sentimental content strategies.

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Sentiment Analysis of Social Media Content: A multi-tool for listening to your audience and developing sentimental content strategies.

Similar to Sentiment Analysis of Social Media Content: A multi-tool for listening to your audience and developing sentimental content strategies. (20)

More from Eirini Ntoutsi

More from Eirini Ntoutsi (12)

Recently uploaded

Recently uploaded (20)

Sentiment Analysis of Social Media Content: A multi-tool for listening to your audience and developing sentimental content strategies.