SlideShare a Scribd company logo
1 of 39
Download to read offline
Dr. Diana Maynard
University of Sheffield, UK
Do we really know what people
mean when they tweet?
We are all connected to each other...
● Information,
thoughts and
opinions are
shared
prolifically on
the social web
these days
● 72% of online
adults use social
networking sites
Your grandmother is three times as likely to
use a social networking site now as in 2009
●
In Britain and the
US, approx 1 hour
a day on social media
●
30 % of Americans get
all their news from
Facebook
●
Facebook has more
users than the whole of
the Internet in 2005
●
40% of Twitter users
don't actually tweet
anything
What are people reading about?
●
Of the top 10 Twitter accounts
with the highest number of
followers:
●
7 pop stars
●
2 social media sites
●
and Barack Obama
●
As you can imagine, there's a lot
of mindless drivel on social
media sites
There can be surprising value in mindless drivel
Opinion mining is about finding out what
people think...
Voice of the Customer
●
Someone who wants to buy a camera
●
Looks for comments and reviews
●
Someone who just bought a camera
●
Comments on it
●
Writes about their experience
●
Camera Manufacturer
●
Gets feedback from customer
●
Improves their products
●
Adjusts marketing strategies
More than just analysing product reviews
●
Understanding customer reviews and so on is a huge
business
●
But also:
●
Tracking political opinions: what events make people
change their minds?
●
How does public mood influence the stock market,
consumer choices etc?
●
How are opinions distributed in relation to
demographics?
●
NLP tools are crucial in order to make sense of all the
information
“Climate change? Oh great!” says Justin Bieber
What else do we want to investigate?
●
Decarbonet project: investigating public perception of
climate change
●
How do opinions change over time and what causes these
changes?
●
How can we influence opinion?
●
Sarcasm detection
●
What social media content should be preserved/forgotten?
Interestingness of opinions (ARCOMEM and ForgetIt
projects)
But there are lots of tools that “analyse”
social media already....
●
Streamcrab http://www.streamcrab.com/
●
Semantria http://semantria.com
●
Social Mention http://socialmention.com/
●
Sentiment140: http://www.sentiment140.com/
●
TipTop: http://feeltiptop.com/
Why are these sites unsuccessful?
●
They don't work well at more than a very basic level
●
They mainly use dictionary lookup for positive and
negative words
●
Or they use ML, which only works for text that's similar in
style to the training data
●
They classify the tweets as positive or negative, but not
with respect to the keyword you're searching for
●
keyword search just retrieves any tweet mentioning it,
but not necessarily about it as a topic
●
no correlation between the keyword and the sentiment
“Positive” tweets about fracking
●
Help me stop fracking. Sign the petition to David
Cameron for a #frack-free UK now!
●
I'll take it as a sign that the gods applaud my new anti-
fracking country love song.
●
#Cameron wants to change the law to allow #fracking
under homes without permission. Tell him NO!!!!!
It's not just about looking at sentiment words
“It's a great movie if you have the taste and sensibilities of a 5-year-
old boy.”
“I'm not happy that John did so well in the debate last night.”
“I'd have liked the film if it had been shorter.”
“You're an idiot.”
●
Conditional statements, content clauses, situational context can
all play a role
Whitney Houston wasn't very popular...
Death confuses opinion mining tools

Opinion mining
tools are good for a
general overview,
but not for some
situations
Opinion spamming
Spam opinion detection (fake reviews)
●
Sometimes people get paid to post “spam” opinions supporting
a product, organisation or even government
●
An article in the New York Times discussed one such company
who gave big discounts to post a 5-star review about the
product on Amazon
●
http://www.nytimes.com/2012/01/27/technology/for-2-a-star-
a-retailer-gets-5-star-reviews.html?_r=3&ref=business
●
Could be either positive or negative opinions
●
Generally, negative opinions are more damaging than positive
ones
How to detect fake opinions?
●
Review content: lexical features, content and style
inconsistencies from the same user, or similarities
between different users
●
Complex relationships between reviews, reviewers
and products
●
Publicly available information about posters (time
posted, posting frequency etc)
Irony and sarcasm
• I had never seen snow in Holland before but thanks to twitter and
facebook I now know what it looks like. Thanks guys, awesome!
• Life's too short, so be sure to read as many articles about
celebrity breakups as possible.
• I feel like there aren't enough singing competitions on TV .
#sarcasmexplosion
• I wish I was cool enough to stalk my ex-boyfriend ! #sarcasm
#bitchtweet
• On a bright note if downing gets injured we have Henderson to
come in
How do you know when someone is being
sarcastic?
●
Use of hashtags in tweets such as #sarcasm, emoticons etc.
●
Large collections of tweets based on hashtags can be used to
make a training set for machine learning
●
But you still have to know which bit of the tweet is the sarcastic
bit
Man , I hate when I get those chain letters & I don't resend them , then
I die the next day .. #Sarcasm
I am not happy that I woke up at 5:15 this morning. #greatstart
#sarcasm
You are really mature. #lying #sarcasm
Case study:
Rule-based Opinion Mining on Tweets
Why Rule-based?
●
Although ML applications are typically used for
Opinion Mining, there are advantages to using a rule-
based approach when training data isn't easily
available
●
For example, working with multiple languages and/or
domains
●
Rule-based system is more easily adaptable
●
Novel use of language and grammar makes ML hard
GATE Components
●
TwitIE
●
structural and linguistic pre-processing, specific
to Twitter
●
includes language detection, hashtag
retokenisation, POS tagging, NER
●
(Optional) term recognition using TermRaider
●
Sentiment gazetteer lookup
●
JAPE opinion detection grammars
●
(Optional) aggregation of opinions
●
includes opinion interestingness component
Basic approach for opinion finding
●
Find sentiment-containing words in a linguistic
relation with terms/entities (opinion-target
matching)
●
Dictionaries give a starting score for sentiment words
●
Use a number of linguistic sub-components to deal
with issues such as negatives, adverbial modification,
swear words, conditionals, sarcasm etc.
●
Starting from basic sentiment lookup, we then adjust
the scores and polarity of the opinions via these
components
A positive sentiment list
●
awesome category=adjective score=0.5
●
beaming category=adjective score=0.5
●
becharm category=verb score=0.5
●
belonging category=noun score=0.5
●
benefic category=adjective score=0.5
●
benevolentlycategory=adverb score=0.5
●
caring category=noun score=0.5
●
charitable category=adjective score=0.5
●
charm category=verb score=0.5
A negative sentiment list
Examples of phrases following the word “go”:
●
down the pan
●
down the drain
●
to the dogs
●
downhill
●
pear-shaped
Opinion scoring
●
Sentiment gazetteers (developed from sentiment words in
WordNet) have a starting “strength” score
●
These get modified by context words, e.g. adverbs, swear
words, negatives and so on
The film was awesome --> The film was really amazing.
The film was awful --> The film was absolutely awful..
The film was good --> The film was not so good.
●
Swear words modifying adjectives count as intensifiers
The film was good --> The film was damned good.
●
Swear words on their own are classified as negative
Damned politicians.
Example: Opinions on Greek Crisis
Using Machine Learning for the task
●
If we can collect enough manually annotated training data, we
can also use an ML approach for this task
●
Product reviews: use star-based rating (but these have flaws)
●
Other domains, e.g. politics: classify sentences or tweets (the
ML instances), many of which do not contain opinions.
●
So the ML classes will be positive, neutral and negative
●
(Some people classify neutral and no opinion as distinct classes,
but we find the distinction too difficult to make reliably)
Training on tweets
●
You can use hashtags as a source of classes
●
Example: collect a set of tweets with the #angry
tag, and a set without it, and delete from the
second set any tweets that look angry
●
Remove the #angry tag from the text in the first
set (so you're not just training the ML to spot the
tag)
●
You now have a corpus of manually annotated
angry/non-angry data
●
This approach can work well, but if you have huge
datasets, you may not be able to do the manual deletions
●
You can also train on things like #sarcasm and #irony
Summary
●
Opinion mining is hard and therefore error-prone (despite
what vendors will tell you about how great their product is)
●
There are many types of sentiment analysis, and many different
uses, each requiring a different solution
●
It's very unlikely that an off-the-shelf tool will do exactly what
you want, and even if it does, performance may be low
●
Opinion mining tools need to be customised to the task and
domain
●
Anything that involves processing social media (especially
messy stuff like Facebook posts and Twitter) is even harder, and
likely to have lower performance
●
For tasks that mainly look at aggregated data, this isn't such an
issue, but for getting specific sentiment on individual
posts/reviews etc, this is more problematic
So where does this leave us?
●
Opinion mining is ubiquitous, but it's still far from perfect,
especially on social media
●
There are lots of linguistic and social quirks that fool
sentiment analysis tools.
●
The good news is that this means there are lots of
interesting problems for us to research
●
And it doesn’t mean we shouldn’t use existing opinion
mining tools
●
The benefits of a modular approach mean that we can pick
the bits that are most useful
●
Take-away message: use the right tool for the right job
More information
●
GATE http://gate.ac.uk (general info, download, tutorials, demos,
references etc)
●
The EU-funded Decarbonet and TrendMiner projects are dealing with lots
of issues about opinion and trend mining from social media
●
http://www.decarbonet.eu
●
http://www.trendminer-project.eu/
●
Tutorials
●
Module 12 of the annual GATE training course: “Opinion
Mining” (2013 version)
http://gate.ac.uk/wiki/TrainingCourseJune2013/
●
Module 14 of the annual GATE training course: “GATE for
social media mining”
Some GATE-related opinion mining papers
(available from http://gate.ac.uk/gate/doc/papers.html)
●
Diana Maynard, Gerhard Gossen, Marco Fisichella, Adam Funk. Should I care about
your opinion? Detection of opinion interestingness and dynamics in social media. To
appear in Journal of Future Internet, Special Issue on Archiving Community Memories,
2014.
●
Diana Maynard and Mark A. Greenwood. Who cares about sarcastic tweets?
Investigating the impact of sarcasm on sentiment analysis. Proc. of LREC 2014,
Reykjavik, Iceland, May 2014.
●
Diana Maynard, David Dupplaw, Jonathon Hare. Multimodal Sentiment Analysis of
Social Media. Proc. of BCS SGAI Workshop on Social Media Analysis, Dec 2013
●
D. Maynard and K. Bontcheva and D. Rout. Challenges in developing opinion mining
tools for social media. In Proceedings of @NLP can u tag #usergeneratedcontent?!
Workshop at LREC 2012, May 2012, Istanbul, Turkey.
●
D. Maynard and A. Funk. Automatic detection of political opinions in tweets. In Raúl
García-Castro, Dieter Fensel and Grigoris Antoniou (eds.) The Semantic Web: ESWC
2011 Selected Workshop Papers, Lecture Notes in Computer Science, Springer, 2011.
●
H.Saggion, A.Funk: Extracting Opinions and Facts for Business Intelligence. Revue
des Nouvelles Technologies de l’Information (RNTI), no. E-17 pp119-146; 2009.
Some demos to try
●
http://sentiment.christopherpotts.net/lexicon/
Get sentiment scores for single words from a variety of
sentiment lexicons
●
http://sentiment.christopherpotts.net/textscores/
Show how a variety of lexicons score novel texts
●
http://sentiment.christopherpotts.net/classify/
Classify tweets according to various probabilistic classifier
models
●
http://demos.gate.ac.uk/arcomem/opinions/
Find and classify opinionated text, using GATE-based
ARCOMEM system
Questions?

More Related Content

Similar to Understanding Sentiment in Social Media: Challenges and Opportunities

Multimodal opinion mining from social media
Multimodal opinion mining from social mediaMultimodal opinion mining from social media
Multimodal opinion mining from social mediaDiana Maynard
 
Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?Diana Maynard
 
Social Media: Friend or Foe (May 2014)
Social Media: Friend or Foe (May 2014)Social Media: Friend or Foe (May 2014)
Social Media: Friend or Foe (May 2014)nilamapatel
 
Social media basics for college
Social media basics for collegeSocial media basics for college
Social media basics for collegeGenevieve Howard
 
Twitter Presentation
Twitter PresentationTwitter Presentation
Twitter PresentationChris Hunter
 
Doing customer development (and stop wasting your time) - StartupBus edition
Doing customer development (and stop wasting your time) -  StartupBus editionDoing customer development (and stop wasting your time) -  StartupBus edition
Doing customer development (and stop wasting your time) - StartupBus editionHans van Gent
 
Website 102 google slides
Website 102   google slidesWebsite 102   google slides
Website 102 google slidesMary Heston
 
Aardman Social Media Training Slides
Aardman Social Media Training SlidesAardman Social Media Training Slides
Aardman Social Media Training SlidesNoisy Little Monkey
 
Social Media for Professional Learning
Social Media for Professional LearningSocial Media for Professional Learning
Social Media for Professional LearningSandy Kendell
 
Effectively Using Technology for Leading and Learning
Effectively Using Technology for Leading and LearningEffectively Using Technology for Leading and Learning
Effectively Using Technology for Leading and LearningBobby Dodd
 
Best Practices for Creating Digital Content to Share in Social Media
Best Practices for Creating Digital Content to Share in Social MediaBest Practices for Creating Digital Content to Share in Social Media
Best Practices for Creating Digital Content to Share in Social MediaAlex Garrido
 
#sbIMPACT: Getting Started with Social Media
#sbIMPACT: Getting Started with Social Media#sbIMPACT: Getting Started with Social Media
#sbIMPACT: Getting Started with Social MediaMel DePaoli, MBA
 
Using Social Media as a Project Management Professional
Using Social Media as a Project Management ProfessionalUsing Social Media as a Project Management Professional
Using Social Media as a Project Management ProfessionalBurriss Consulting, Inc.
 
Challenges of social media analysis in the real world
Challenges of social media analysis in the real worldChallenges of social media analysis in the real world
Challenges of social media analysis in the real worldDiana Maynard
 
Social media for the voice actor
Social media for the voice actorSocial media for the voice actor
Social media for the voice actorvoicesforall
 

Similar to Understanding Sentiment in Social Media: Challenges and Opportunities (20)

Multimodal opinion mining from social media
Multimodal opinion mining from social mediaMultimodal opinion mining from social media
Multimodal opinion mining from social media
 
Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?
 
Social Media: Friend or Foe (May 2014)
Social Media: Friend or Foe (May 2014)Social Media: Friend or Foe (May 2014)
Social Media: Friend or Foe (May 2014)
 
Cls8 decarbonet
Cls8 decarbonetCls8 decarbonet
Cls8 decarbonet
 
Slide Presentation Week 9
Slide Presentation Week 9Slide Presentation Week 9
Slide Presentation Week 9
 
Social media basics for college
Social media basics for collegeSocial media basics for college
Social media basics for college
 
Social Media in Short- January 2014
Social Media in Short- January 2014Social Media in Short- January 2014
Social Media in Short- January 2014
 
Twitter Presentation
Twitter PresentationTwitter Presentation
Twitter Presentation
 
Doing customer development (and stop wasting your time) - StartupBus edition
Doing customer development (and stop wasting your time) -  StartupBus editionDoing customer development (and stop wasting your time) -  StartupBus edition
Doing customer development (and stop wasting your time) - StartupBus edition
 
Website 102 google slides
Website 102   google slidesWebsite 102   google slides
Website 102 google slides
 
Aardman Social Media Training Slides
Aardman Social Media Training SlidesAardman Social Media Training Slides
Aardman Social Media Training Slides
 
Social Media for Professional Learning
Social Media for Professional LearningSocial Media for Professional Learning
Social Media for Professional Learning
 
Social Degital Media Basics
Social Degital Media BasicsSocial Degital Media Basics
Social Degital Media Basics
 
Effectively Using Technology for Leading and Learning
Effectively Using Technology for Leading and LearningEffectively Using Technology for Leading and Learning
Effectively Using Technology for Leading and Learning
 
Twitter Smarter in 2016
Twitter Smarter in 2016 Twitter Smarter in 2016
Twitter Smarter in 2016
 
Best Practices for Creating Digital Content to Share in Social Media
Best Practices for Creating Digital Content to Share in Social MediaBest Practices for Creating Digital Content to Share in Social Media
Best Practices for Creating Digital Content to Share in Social Media
 
#sbIMPACT: Getting Started with Social Media
#sbIMPACT: Getting Started with Social Media#sbIMPACT: Getting Started with Social Media
#sbIMPACT: Getting Started with Social Media
 
Using Social Media as a Project Management Professional
Using Social Media as a Project Management ProfessionalUsing Social Media as a Project Management Professional
Using Social Media as a Project Management Professional
 
Challenges of social media analysis in the real world
Challenges of social media analysis in the real worldChallenges of social media analysis in the real world
Challenges of social media analysis in the real world
 
Social media for the voice actor
Social media for the voice actorSocial media for the voice actor
Social media for the voice actor
 

More from Diana Maynard

Filth and lies: analysing social media
Filth and lies: analysing social mediaFilth and lies: analysing social media
Filth and lies: analysing social mediaDiana Maynard
 
Adding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long wayAdding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long wayDiana Maynard
 
Methodological possibilities for strengthening the monitoring of SDG indicato...
Methodological possibilities for strengthening the monitoring of SDG indicato...Methodological possibilities for strengthening the monitoring of SDG indicato...
Methodological possibilities for strengthening the monitoring of SDG indicato...Diana Maynard
 
Getting the-most-out-of-conferences
Getting the-most-out-of-conferencesGetting the-most-out-of-conferences
Getting the-most-out-of-conferencesDiana Maynard
 
Understanding the world with NLP: interactions between society, behaviour and...
Understanding the world with NLP: interactions between society, behaviour and...Understanding the world with NLP: interactions between society, behaviour and...
Understanding the world with NLP: interactions between society, behaviour and...Diana Maynard
 
The language of social media
The language of social mediaThe language of social media
The language of social mediaDiana Maynard
 
Text analysis-semantic-search
Text analysis-semantic-searchText analysis-semantic-search
Text analysis-semantic-searchDiana Maynard
 
Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...Diana Maynard
 
Adapting NLP tools to diverse data: challenges and solutions
Adapting NLP tools to diverse data: challenges and solutionsAdapting NLP tools to diverse data: challenges and solutions
Adapting NLP tools to diverse data: challenges and solutionsDiana Maynard
 
Ontologies as bridges between data sources and user queries: the KNOWMAK proj...
Ontologies as bridges between data sources and user queries: the KNOWMAK proj...Ontologies as bridges between data sources and user queries: the KNOWMAK proj...
Ontologies as bridges between data sources and user queries: the KNOWMAK proj...Diana Maynard
 
20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...
20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...
20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...Diana Maynard
 
Text Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATEText Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATEDiana Maynard
 
GATE: a text analysis tool for social media
GATE: a text analysis tool for social mediaGATE: a text analysis tool for social media
GATE: a text analysis tool for social mediaDiana Maynard
 
Tools for (Almost) Real-Time Social Media Analysis
Tools for (Almost) Real-Time Social Media AnalysisTools for (Almost) Real-Time Social Media Analysis
Tools for (Almost) Real-Time Social Media AnalysisDiana Maynard
 
Social media analytics as a service: tools from GATE
Social media analytics as a service: tools from GATESocial media analytics as a service: tools from GATE
Social media analytics as a service: tools from GATEDiana Maynard
 
Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEDiana Maynard
 
Disability and Adventure Travel: the Double-Edged Sword
Disability and Adventure Travel: the Double-Edged SwordDisability and Adventure Travel: the Double-Edged Sword
Disability and Adventure Travel: the Double-Edged SwordDiana Maynard
 
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...Diana Maynard
 
Practical Opinion Mining for Social Media
Practical Opinion Mining for Social MediaPractical Opinion Mining for Social Media
Practical Opinion Mining for Social MediaDiana Maynard
 
A tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysisA tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysisDiana Maynard
 

More from Diana Maynard (20)

Filth and lies: analysing social media
Filth and lies: analysing social mediaFilth and lies: analysing social media
Filth and lies: analysing social media
 
Adding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long wayAdding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long way
 
Methodological possibilities for strengthening the monitoring of SDG indicato...
Methodological possibilities for strengthening the monitoring of SDG indicato...Methodological possibilities for strengthening the monitoring of SDG indicato...
Methodological possibilities for strengthening the monitoring of SDG indicato...
 
Getting the-most-out-of-conferences
Getting the-most-out-of-conferencesGetting the-most-out-of-conferences
Getting the-most-out-of-conferences
 
Understanding the world with NLP: interactions between society, behaviour and...
Understanding the world with NLP: interactions between society, behaviour and...Understanding the world with NLP: interactions between society, behaviour and...
Understanding the world with NLP: interactions between society, behaviour and...
 
The language of social media
The language of social mediaThe language of social media
The language of social media
 
Text analysis-semantic-search
Text analysis-semantic-searchText analysis-semantic-search
Text analysis-semantic-search
 
Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...
 
Adapting NLP tools to diverse data: challenges and solutions
Adapting NLP tools to diverse data: challenges and solutionsAdapting NLP tools to diverse data: challenges and solutions
Adapting NLP tools to diverse data: challenges and solutions
 
Ontologies as bridges between data sources and user queries: the KNOWMAK proj...
Ontologies as bridges between data sources and user queries: the KNOWMAK proj...Ontologies as bridges between data sources and user queries: the KNOWMAK proj...
Ontologies as bridges between data sources and user queries: the KNOWMAK proj...
 
20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...
20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...
20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...
 
Text Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATEText Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATE
 
GATE: a text analysis tool for social media
GATE: a text analysis tool for social mediaGATE: a text analysis tool for social media
GATE: a text analysis tool for social media
 
Tools for (Almost) Real-Time Social Media Analysis
Tools for (Almost) Real-Time Social Media AnalysisTools for (Almost) Real-Time Social Media Analysis
Tools for (Almost) Real-Time Social Media Analysis
 
Social media analytics as a service: tools from GATE
Social media analytics as a service: tools from GATESocial media analytics as a service: tools from GATE
Social media analytics as a service: tools from GATE
 
Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATE
 
Disability and Adventure Travel: the Double-Edged Sword
Disability and Adventure Travel: the Double-Edged SwordDisability and Adventure Travel: the Double-Edged Sword
Disability and Adventure Travel: the Double-Edged Sword
 
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...
 
Practical Opinion Mining for Social Media
Practical Opinion Mining for Social MediaPractical Opinion Mining for Social Media
Practical Opinion Mining for Social Media
 
A tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysisA tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysis
 

Recently uploaded

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 

Recently uploaded (20)

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 

Understanding Sentiment in Social Media: Challenges and Opportunities

  • 1. Dr. Diana Maynard University of Sheffield, UK Do we really know what people mean when they tweet?
  • 2. We are all connected to each other... ● Information, thoughts and opinions are shared prolifically on the social web these days ● 72% of online adults use social networking sites
  • 3. Your grandmother is three times as likely to use a social networking site now as in 2009
  • 4. ● In Britain and the US, approx 1 hour a day on social media ● 30 % of Americans get all their news from Facebook ● Facebook has more users than the whole of the Internet in 2005 ● 40% of Twitter users don't actually tweet anything
  • 5. What are people reading about? ● Of the top 10 Twitter accounts with the highest number of followers: ● 7 pop stars ● 2 social media sites ● and Barack Obama ● As you can imagine, there's a lot of mindless drivel on social media sites
  • 6. There can be surprising value in mindless drivel
  • 7.
  • 8. Opinion mining is about finding out what people think...
  • 9. Voice of the Customer ● Someone who wants to buy a camera ● Looks for comments and reviews ● Someone who just bought a camera ● Comments on it ● Writes about their experience ● Camera Manufacturer ● Gets feedback from customer ● Improves their products ● Adjusts marketing strategies
  • 10. More than just analysing product reviews ● Understanding customer reviews and so on is a huge business ● But also: ● Tracking political opinions: what events make people change their minds? ● How does public mood influence the stock market, consumer choices etc? ● How are opinions distributed in relation to demographics? ● NLP tools are crucial in order to make sense of all the information
  • 11. “Climate change? Oh great!” says Justin Bieber
  • 12. What else do we want to investigate? ● Decarbonet project: investigating public perception of climate change ● How do opinions change over time and what causes these changes? ● How can we influence opinion? ● Sarcasm detection ● What social media content should be preserved/forgotten? Interestingness of opinions (ARCOMEM and ForgetIt projects)
  • 13. But there are lots of tools that “analyse” social media already.... ● Streamcrab http://www.streamcrab.com/ ● Semantria http://semantria.com ● Social Mention http://socialmention.com/ ● Sentiment140: http://www.sentiment140.com/ ● TipTop: http://feeltiptop.com/
  • 14. Why are these sites unsuccessful? ● They don't work well at more than a very basic level ● They mainly use dictionary lookup for positive and negative words ● Or they use ML, which only works for text that's similar in style to the training data ● They classify the tweets as positive or negative, but not with respect to the keyword you're searching for ● keyword search just retrieves any tweet mentioning it, but not necessarily about it as a topic ● no correlation between the keyword and the sentiment
  • 15. “Positive” tweets about fracking ● Help me stop fracking. Sign the petition to David Cameron for a #frack-free UK now! ● I'll take it as a sign that the gods applaud my new anti- fracking country love song. ● #Cameron wants to change the law to allow #fracking under homes without permission. Tell him NO!!!!!
  • 16. It's not just about looking at sentiment words “It's a great movie if you have the taste and sensibilities of a 5-year- old boy.” “I'm not happy that John did so well in the debate last night.” “I'd have liked the film if it had been shorter.” “You're an idiot.” ● Conditional statements, content clauses, situational context can all play a role
  • 17. Whitney Houston wasn't very popular...
  • 18. Death confuses opinion mining tools  Opinion mining tools are good for a general overview, but not for some situations
  • 20. Spam opinion detection (fake reviews) ● Sometimes people get paid to post “spam” opinions supporting a product, organisation or even government ● An article in the New York Times discussed one such company who gave big discounts to post a 5-star review about the product on Amazon ● http://www.nytimes.com/2012/01/27/technology/for-2-a-star- a-retailer-gets-5-star-reviews.html?_r=3&ref=business ● Could be either positive or negative opinions ● Generally, negative opinions are more damaging than positive ones
  • 21. How to detect fake opinions? ● Review content: lexical features, content and style inconsistencies from the same user, or similarities between different users ● Complex relationships between reviews, reviewers and products ● Publicly available information about posters (time posted, posting frequency etc)
  • 22. Irony and sarcasm • I had never seen snow in Holland before but thanks to twitter and facebook I now know what it looks like. Thanks guys, awesome! • Life's too short, so be sure to read as many articles about celebrity breakups as possible. • I feel like there aren't enough singing competitions on TV . #sarcasmexplosion • I wish I was cool enough to stalk my ex-boyfriend ! #sarcasm #bitchtweet • On a bright note if downing gets injured we have Henderson to come in
  • 23. How do you know when someone is being sarcastic? ● Use of hashtags in tweets such as #sarcasm, emoticons etc. ● Large collections of tweets based on hashtags can be used to make a training set for machine learning ● But you still have to know which bit of the tweet is the sarcastic bit Man , I hate when I get those chain letters & I don't resend them , then I die the next day .. #Sarcasm I am not happy that I woke up at 5:15 this morning. #greatstart #sarcasm You are really mature. #lying #sarcasm
  • 24. Case study: Rule-based Opinion Mining on Tweets
  • 25. Why Rule-based? ● Although ML applications are typically used for Opinion Mining, there are advantages to using a rule- based approach when training data isn't easily available ● For example, working with multiple languages and/or domains ● Rule-based system is more easily adaptable ● Novel use of language and grammar makes ML hard
  • 26. GATE Components ● TwitIE ● structural and linguistic pre-processing, specific to Twitter ● includes language detection, hashtag retokenisation, POS tagging, NER ● (Optional) term recognition using TermRaider ● Sentiment gazetteer lookup ● JAPE opinion detection grammars ● (Optional) aggregation of opinions ● includes opinion interestingness component
  • 27. Basic approach for opinion finding ● Find sentiment-containing words in a linguistic relation with terms/entities (opinion-target matching) ● Dictionaries give a starting score for sentiment words ● Use a number of linguistic sub-components to deal with issues such as negatives, adverbial modification, swear words, conditionals, sarcasm etc. ● Starting from basic sentiment lookup, we then adjust the scores and polarity of the opinions via these components
  • 28. A positive sentiment list ● awesome category=adjective score=0.5 ● beaming category=adjective score=0.5 ● becharm category=verb score=0.5 ● belonging category=noun score=0.5 ● benefic category=adjective score=0.5 ● benevolentlycategory=adverb score=0.5 ● caring category=noun score=0.5 ● charitable category=adjective score=0.5 ● charm category=verb score=0.5
  • 29. A negative sentiment list Examples of phrases following the word “go”: ● down the pan ● down the drain ● to the dogs ● downhill ● pear-shaped
  • 30. Opinion scoring ● Sentiment gazetteers (developed from sentiment words in WordNet) have a starting “strength” score ● These get modified by context words, e.g. adverbs, swear words, negatives and so on The film was awesome --> The film was really amazing. The film was awful --> The film was absolutely awful.. The film was good --> The film was not so good. ● Swear words modifying adjectives count as intensifiers The film was good --> The film was damned good. ● Swear words on their own are classified as negative Damned politicians.
  • 31. Example: Opinions on Greek Crisis
  • 32. Using Machine Learning for the task ● If we can collect enough manually annotated training data, we can also use an ML approach for this task ● Product reviews: use star-based rating (but these have flaws) ● Other domains, e.g. politics: classify sentences or tweets (the ML instances), many of which do not contain opinions. ● So the ML classes will be positive, neutral and negative ● (Some people classify neutral and no opinion as distinct classes, but we find the distinction too difficult to make reliably)
  • 33. Training on tweets ● You can use hashtags as a source of classes ● Example: collect a set of tweets with the #angry tag, and a set without it, and delete from the second set any tweets that look angry ● Remove the #angry tag from the text in the first set (so you're not just training the ML to spot the tag) ● You now have a corpus of manually annotated angry/non-angry data ● This approach can work well, but if you have huge datasets, you may not be able to do the manual deletions ● You can also train on things like #sarcasm and #irony
  • 34. Summary ● Opinion mining is hard and therefore error-prone (despite what vendors will tell you about how great their product is) ● There are many types of sentiment analysis, and many different uses, each requiring a different solution ● It's very unlikely that an off-the-shelf tool will do exactly what you want, and even if it does, performance may be low ● Opinion mining tools need to be customised to the task and domain ● Anything that involves processing social media (especially messy stuff like Facebook posts and Twitter) is even harder, and likely to have lower performance ● For tasks that mainly look at aggregated data, this isn't such an issue, but for getting specific sentiment on individual posts/reviews etc, this is more problematic
  • 35. So where does this leave us? ● Opinion mining is ubiquitous, but it's still far from perfect, especially on social media ● There are lots of linguistic and social quirks that fool sentiment analysis tools. ● The good news is that this means there are lots of interesting problems for us to research ● And it doesn’t mean we shouldn’t use existing opinion mining tools ● The benefits of a modular approach mean that we can pick the bits that are most useful ● Take-away message: use the right tool for the right job
  • 36. More information ● GATE http://gate.ac.uk (general info, download, tutorials, demos, references etc) ● The EU-funded Decarbonet and TrendMiner projects are dealing with lots of issues about opinion and trend mining from social media ● http://www.decarbonet.eu ● http://www.trendminer-project.eu/ ● Tutorials ● Module 12 of the annual GATE training course: “Opinion Mining” (2013 version) http://gate.ac.uk/wiki/TrainingCourseJune2013/ ● Module 14 of the annual GATE training course: “GATE for social media mining”
  • 37. Some GATE-related opinion mining papers (available from http://gate.ac.uk/gate/doc/papers.html) ● Diana Maynard, Gerhard Gossen, Marco Fisichella, Adam Funk. Should I care about your opinion? Detection of opinion interestingness and dynamics in social media. To appear in Journal of Future Internet, Special Issue on Archiving Community Memories, 2014. ● Diana Maynard and Mark A. Greenwood. Who cares about sarcastic tweets? Investigating the impact of sarcasm on sentiment analysis. Proc. of LREC 2014, Reykjavik, Iceland, May 2014. ● Diana Maynard, David Dupplaw, Jonathon Hare. Multimodal Sentiment Analysis of Social Media. Proc. of BCS SGAI Workshop on Social Media Analysis, Dec 2013 ● D. Maynard and K. Bontcheva and D. Rout. Challenges in developing opinion mining tools for social media. In Proceedings of @NLP can u tag #usergeneratedcontent?! Workshop at LREC 2012, May 2012, Istanbul, Turkey. ● D. Maynard and A. Funk. Automatic detection of political opinions in tweets. In Raúl García-Castro, Dieter Fensel and Grigoris Antoniou (eds.) The Semantic Web: ESWC 2011 Selected Workshop Papers, Lecture Notes in Computer Science, Springer, 2011. ● H.Saggion, A.Funk: Extracting Opinions and Facts for Business Intelligence. Revue des Nouvelles Technologies de l’Information (RNTI), no. E-17 pp119-146; 2009.
  • 38. Some demos to try ● http://sentiment.christopherpotts.net/lexicon/ Get sentiment scores for single words from a variety of sentiment lexicons ● http://sentiment.christopherpotts.net/textscores/ Show how a variety of lexicons score novel texts ● http://sentiment.christopherpotts.net/classify/ Classify tweets according to various probabilistic classifier models ● http://demos.gate.ac.uk/arcomem/opinions/ Find and classify opinionated text, using GATE-based ARCOMEM system