This document discusses sentiment analysis of social media content. It begins by noting the abundance of opinion-rich resources online due to the social nature of Web 2.0. Sentiment analysis aims to automatically analyze opinions expressed in text to determine sentiment. This is challenging for social media due to informal language, short texts, and continuous new content. The document outlines challenges in building sentiment classifiers from social media data due to the need for large labeled training datasets but few available labels. It describes using emoticons and sentiment dictionaries to implicitly label some data for use in classifier training and evaluates the agreement between these labeling methods.
This presentation consist of detail description regarding how social media sentiments analysis is performed , what is its scope and benefits in real life scenario.
Twitter Sentiment Analysis Project Done using R.
In these Project we deal with the tweets database that are avaialble to us by the Twitter. We clean the tweets and break them out into tokens and than analysis each word using Bag of Word concept and than rate each word on the basis of the score wheter it is positive, negative and neutral.
We used Naive Baye's Classifier as our base.
This is small twitter sentiment analysis project which will take one keyword(which is the primary way of storing the tweet in Twitter) and number of tweets, and gives you the pictorial representation of the overall sentiment.
Sentiment analysis is essential operation to understand the polarity of particular text, blog etc. This presentation has introduction to SA and the approaches in which they can be designed.
This presentation consist of detail description regarding how social media sentiments analysis is performed , what is its scope and benefits in real life scenario.
Twitter Sentiment Analysis Project Done using R.
In these Project we deal with the tweets database that are avaialble to us by the Twitter. We clean the tweets and break them out into tokens and than analysis each word using Bag of Word concept and than rate each word on the basis of the score wheter it is positive, negative and neutral.
We used Naive Baye's Classifier as our base.
This is small twitter sentiment analysis project which will take one keyword(which is the primary way of storing the tweet in Twitter) and number of tweets, and gives you the pictorial representation of the overall sentiment.
Sentiment analysis is essential operation to understand the polarity of particular text, blog etc. This presentation has introduction to SA and the approaches in which they can be designed.
Sentiment analysis of Twitter data using pythonHetu Bhavsar
Twitter is a popular social networking website where users posts and interact with messages known as “tweets”. To automate the analysis of such data, the area of Sentiment Analysis has emerged. It aims at identifying opinionative data in the Web and classifying them according to their polarity, i.e., whether they carry a positive or negative connotation. We will attempt to conduct sentiment analysis on “tweets” using various different machine learning algorithms.
SentiTweet is a sentiment analysis tool for identifying the sentiment of the tweets as positive, negative and neutral.SentiTweet comes to rescue to find the sentiment of a single tweet or a set of tweets. Not only that it also enables you to find out the sentiment of the entire tweet or specific phrases of the tweet.
Review of Natural Language Processing tasks and examples of why it is so hard. Then he describes in detail text categorization and particularly sentiment analysis. A few common approaches for predicting sentiment are discussed, going even further, explaining statistical machine learning algorithms.
Project Report for Twitter Sentiment Analysis done using Apache Flume and data is analysed using Hive.
I intend to address the following questions:
How raw tweets can be used to find audience’s perception or sentiment about a person ?
How Hadoop can be used to solve this problem?
How Apache Hive can be used to organize the final data in a tabular format and query it?
How a data visualization tool can be used to display the findings?
It gives an overview of Sentiment Analysis, Natural Language Processing, Phases of Sentiment Analysis using NLP, brief idea of Machine Learning, Textblob API and related topics.
Make a query regarding a topic of interest and come to know the sentiment for the day in pie-chart or for the week in form of line-chart for the tweets gathered from twitter.com
One fundamental problem in sentiment analysis is categorization of sentiment polarity. Given a piece of written text, the problem is to categorize the text into one specific sentiment polarity, positive or negative (or neutral). Based on the scope of the text, there are three distinctions of sentiment polarity categorization, namely the document level, the sentence level, and the entity and aspect level. Consider a review “I like multimedia features but the battery life sucks.†This sentence has a mixed emotion. The emotion regarding multimedia is positive whereas that regarding battery life is negative. Hence, it is required to extract only those opinions relevant to a particular feature (like battery life or multimedia) and classify them, instead of taking the complete sentence and the overall sentiment. In this paper, we present a novel approach to identify pattern specific expressions of opinion in text.
Sentiment Analysis Using Hybrid Structure of Machine Learning AlgorithmsSangeeth Nagarajan
Sentiment Analysis is the process used to determine the attitude/ opinion/ emotion expressed by a person about a particular topic. The presentation dealt with general approach and different machine learning based classification alogorithms. The slides is based on the work "Sentiment analysis using Neuro-Fuzzy and Hidden Markov models of text" by Rustamov S , Mustafayev E and Clements M A.
This Project Aimed at doing a comprehensive study of Different Machine Learning Approaches on Sentiment Analysis of Movie Reviews. Support Vector Machines were the one that Performed Most Accurately with Radial Basis Function. Lots of Other kernel functions and Kernel Parameters were tried to find the optimal one. We achieved accuracy up to 83%.
Sentiment analysis is the interpretation and classification of emotions (positive, negative and neutral) within text data using text analysis techniques. Sentiment analysis allows businesses to identify customer sentiment toward products, brands or services in online conversations and feedback
Sentiment analysis of Twitter data using pythonHetu Bhavsar
Twitter is a popular social networking website where users posts and interact with messages known as “tweets”. To automate the analysis of such data, the area of Sentiment Analysis has emerged. It aims at identifying opinionative data in the Web and classifying them according to their polarity, i.e., whether they carry a positive or negative connotation. We will attempt to conduct sentiment analysis on “tweets” using various different machine learning algorithms.
SentiTweet is a sentiment analysis tool for identifying the sentiment of the tweets as positive, negative and neutral.SentiTweet comes to rescue to find the sentiment of a single tweet or a set of tweets. Not only that it also enables you to find out the sentiment of the entire tweet or specific phrases of the tweet.
Review of Natural Language Processing tasks and examples of why it is so hard. Then he describes in detail text categorization and particularly sentiment analysis. A few common approaches for predicting sentiment are discussed, going even further, explaining statistical machine learning algorithms.
Project Report for Twitter Sentiment Analysis done using Apache Flume and data is analysed using Hive.
I intend to address the following questions:
How raw tweets can be used to find audience’s perception or sentiment about a person ?
How Hadoop can be used to solve this problem?
How Apache Hive can be used to organize the final data in a tabular format and query it?
How a data visualization tool can be used to display the findings?
It gives an overview of Sentiment Analysis, Natural Language Processing, Phases of Sentiment Analysis using NLP, brief idea of Machine Learning, Textblob API and related topics.
Make a query regarding a topic of interest and come to know the sentiment for the day in pie-chart or for the week in form of line-chart for the tweets gathered from twitter.com
One fundamental problem in sentiment analysis is categorization of sentiment polarity. Given a piece of written text, the problem is to categorize the text into one specific sentiment polarity, positive or negative (or neutral). Based on the scope of the text, there are three distinctions of sentiment polarity categorization, namely the document level, the sentence level, and the entity and aspect level. Consider a review “I like multimedia features but the battery life sucks.†This sentence has a mixed emotion. The emotion regarding multimedia is positive whereas that regarding battery life is negative. Hence, it is required to extract only those opinions relevant to a particular feature (like battery life or multimedia) and classify them, instead of taking the complete sentence and the overall sentiment. In this paper, we present a novel approach to identify pattern specific expressions of opinion in text.
Sentiment Analysis Using Hybrid Structure of Machine Learning AlgorithmsSangeeth Nagarajan
Sentiment Analysis is the process used to determine the attitude/ opinion/ emotion expressed by a person about a particular topic. The presentation dealt with general approach and different machine learning based classification alogorithms. The slides is based on the work "Sentiment analysis using Neuro-Fuzzy and Hidden Markov models of text" by Rustamov S , Mustafayev E and Clements M A.
This Project Aimed at doing a comprehensive study of Different Machine Learning Approaches on Sentiment Analysis of Movie Reviews. Support Vector Machines were the one that Performed Most Accurately with Radial Basis Function. Lots of Other kernel functions and Kernel Parameters were tried to find the optimal one. We achieved accuracy up to 83%.
Sentiment analysis is the interpretation and classification of emotions (positive, negative and neutral) within text data using text analysis techniques. Sentiment analysis allows businesses to identify customer sentiment toward products, brands or services in online conversations and feedback
Determine the sentiment of sentence that is positive or negative based on the presence of part of
speech tag, the emoticons present in the sentences. For this research we use the most popular microblogging sit
twitter for sentiment orientation. In this paper we want to extract tweets form the twitter related to the product
like mobile phones, home appliances, vehicle etc. After retrieving tweets we perform some preprocessing on it
like remove retweets, remove tweets containing few words with minimum threshold of length five, remove tweets
containing only urls. After this the remaining tweets are pre-processed like that transform all letters of the
tweets to the lower case then remove punctuation from the tweets because it reduces the accuracy of result.
After this remove extra white spaces from the tweets, then we apply a pos tagger to tag each word. The tuple
after the applying above steps contain (word, pos tag, English-word, stop-word). We are interested in only
tweets that contain opinion and eliminate the remaining non-opinion tweets from the data set. For this we use
the Naïve Bays classification algorithm. After this we use short text classification on tweets i.e., the word having
different meaning in different domain. In order to solve this problem we use two different feature selection
algorithms the mutual information (MI) and the X2 feature selection. At final stage predicting the orientation of
an opinion sentence that is positive or negative as we mentioned above. For this we use two model like unigram
model and opinion miner.
Conversation Research: Leveraging the power of social media market researchInSites on Stage
Conversation Research: Leveraging the power of social media market research by Robert Dossin, presented at the MRS Member Evening on February 10, 2014 in London (UK).
The Challenges of Affect Detection in the Social Programmer EcosystemNicole Novielli
Invited talk at the University of Hamburg - January 2016
https://www.inf.uni-hamburg.de/home/news/kolloquium/wise15-16/novielli-nicole.html
More info: N. Novielli, F. Calefato, F. Lanubile. “The Challenges of Sentiment Detection in the Social Programmer Ecosystem” In Proc. 7th Int’l Workshop on Social Software Engineering (SSE’15), Sep. 1, 2015, Bergamo, Italy.
Software engineering involves a large amount of social interaction, as programmers often need to cooperate with others, whether directly or indirectly. However, we have become fully aware of the importance of social aspects in software engineering activities only over the last decade. In fact, it was not until the recent diffusion and massive adoption of social media that we could witness the rise of the “social programmer” and the surrounding ecosystem. Social media has deeply influenced the design of software development-oriented tools such as GitHub (i.e., a social coding site) and Stack Overflow (i.e., a community-based question answering site). Stack Overflow, in particular, is an example of an online community where social programmers do networking by reading and answering others’ questions, thus participating in the creation and diffusion of crowdsourced knowledge and software documentation.
One of the biggest drawbacks of computer-mediated communication is to appropriately convey sentiment through text. While display rules for emotions exist and are widely accepted for interaction in traditional face-to-face communication, web users are not necessarily prepared for effectively dealing with the social media barriers to non-verbal communication. Thus, the design of systems and mechanisms for the development of emotional awareness between communicators is an important technical and social challenge for research related to computer-supported collaboration and social computing.
As a consequence, a recent research trend has emerged to study the role of affect in the social programmer ecosystem, by applying sentiment analysis to the content available in sites such as GitHub and Stack Overflow, as well as in other asynchronous communication artifacts such as comments in issue tracking systems. This talk surveys the state-of-the-art in sentiment analysis tools and examines to what extent they are able to detect affective expressions in communication traces left by software developers. A discussion is offered about the advantages and limitations of choosing sentiment polarity and strength as an appropriate way to operationalize affective states in empirical studies. Finally, open challenges and opportunities of affective software engineering are discussed, with special focus on the need to combine cognitive emotion modeling with affective computing and natural language processing techniques to build large-scale, robust approaches for sentiment detection in software engineering.
Sentiment analysis, also known as opinion mining, is a field of computer science that focuses on automatically identifying the opinions and feelings expressed in text, audio and video. It aims to determine whether a document expresses a subjective view (positive, negative, or neutral) or presents objective facts.
Sentiment analysis involves determining the sentiment expressed by a writer in a document. The objective of the opinion-mining field is to conduct subjectivity analysis, indicating whether a document is subjective or objective. Subjectivity implies the presence of sentiment, while objectivity signifies content devoid of sentiment. Currently, an abundance of information about a specific product is available, with a single product often garnering hundreds of reviews across various webpages. Numerous websites, such as imdb.com, amazon.com, idlebrain.com, among others, aggregate user information and expert opinions to publish reviews. Experts meticulously analyze reviews, extract opinions, and generate ratings related to the dataset provided by the requesting agencies. However, handling the vast amount of data is a labor-intensive task for experts. The continuously growing volume of web data poses challenges in extracting precise opinions from content. Hence, there is a need to design a system that can efficiently perform these tasks with human-like accuracy.
In this research work, the propose approach enough capable of handling and analyzing large amounts of reviews. The reviews considered of analyzing are pre-analyzed with existing algorithms and further processed through the approach proposed in the present research work. The working capacity of the proposed approach extracts sentiment from the available content (dataset) and determines polarity degree using sentiment polarity and degree management. It also measures sentiment degrees based on user-provided target document features. The outcome is a summary comprising highly sentiment-related sentences, providing valuable insights to the users. The goal is to streamline sentiment analysis processes and enhance accuracy in a manner that aligns with human-like comprehension.
Playing with Digital Meaning: Video Games, Narrative, CognitionCody Mejeur
Presentation for the "Cognition and Digitisation: Joint Futures in the Humanities?" workshop as part of the Cognitive Futures in the Humanities 2016 conference.
Similar to Sentiment Analysis of Social Media Content: A multi-tool for listening to your audience and developing sentimental content strategies. (20)
Fairness-aware learning: From single models to sequential ensemble learning a...Eirini Ntoutsi
An overview of fairness-aware learning: from batch-learning with single models to batch-learning with sequential ensembles and to fairness-aware learning over non-stationary data streams
Invited talk on fairness in AI systems at the 2nd Workshop on Interactive Natural Language Technology for Explainable AI co-located with the International Conference on Natural Language Generation, 18/12/2020.
Discovering and Monitoring Product Features and the Opinions on them with OPI...Eirini Ntoutsi
Opinion stream mining encompasses methods for monitoring and understanding how people’s attitude towards products changes. Understanding which product features influence a buyer’s choice positively or
negatively allows decision makers to make well-informed decisions on improving their products or marketing them properly. We propose OPINSTREAM, a framework for the discovery and polarity monitoring of implicit product features deemed important in the people’s reviews on different products.
Το NeeMo είναι ένα εργαλείο για την αυτόματη σύνοψη της Ελληνικής blogo-σφαιρας.
Ανά τακτά χρονικά διαστήματα, το NeeMo συλλέγει posts από όλα τα blogs, τα ομαδοποιεί σε θέματα και τα παρουσιάζει με βάση το πόσο ενδιαφέρουν την κοινή γνώμη.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Sentiment Analysis of Social Media Content: A multi-tool for listening to your audience and developing sentimental content strategies.
1. Machine Learning for Big Data
Prof. Dr. Eirini Ntoutsi
Leibniz University Hannover & L3S Research Center
Sentiment Analysis of Social Media Content
A multi-tool for listening to your audience and
developing sentimental content strategies
EUMade4All Workshop, Hannover, 29.9.2017
2. Outline
A world of opinions
Analyzing opinions for sentiment
Using sentimental content
2Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
3. A Web/World of opinions
With the advent of Web 2.0 and its social character a lot of opinion-rich
resources have arise
Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content 3
7. Why we care?
Opinions are produced at a constant basis and are (most of
the times) freely available
Free feedback from our customers/ users
Valuable source of information for companies, politicians1,
decision makers
Companies turn into social media monitoring in order to
optimize and strengthen their products and brands
An opportunity for marketers to pay attention to
consumers’ feelings towards their brand
People have the power to influence each other in their
decisions
Product design could be driven by user requests
Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content 7
1https://motherboard.vice.com/en_us/article/mg9vvn/how-our-likes-helped-trump-win
8. Sentiment analysis
Opinions on Vodafone
Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
What we are interested in?
(Automatically) Identifying the negative tweets (and reacting … customer care)
8
9. Aspect-oriented sentiment analysis
Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
It‘s not ALL good or bad
Reviews from TripAdvisor on Vienna Marriott Hotel
2/5/2014: Great hotel, very nice rooms, perfect location, very nice staff except for a mid-aged female receptionist who tried to
charge me extra for wifi fees when checking out. It was waived at the desk when I checked-in. And she started treating me with
an attitude after she found out that I got a great deal through priceline.com. ….
26/1/2014: Spent a long weekend here. Rooms clean and functional without being spectacular and a nice pool etc. Staff in pool
weren't Good and I found them actually quite rood. Executive lounge was ok and not busy but selection of wine and beer wasn't
great. The reception has many shops and a bar at the end which kind of males it feel like a shopping centre. Overall great for
business travel but not sure id come again for leisure.
7/5/2013: The Vienna Marriott has all you expect; no frills, but solid service and they get all the basic stuff done right.
It's in a fine location, maybe 10 minute walk from the major city attractions while being in a quiet area. Breakfast buffet
exceptional and good fitness center. Very helpful and happy staff.
Lobby lounge just okay. Not a good wine selection and the Sinatra-like singer adds nothing.
Maybe just a little more expensive than it should be, too.
What we are interested in?
What people are talking about (items and item aspects)
The attitude of people towards these items and aspects
9
10. (Sentiment- & aspect-based) opinion summarization
Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content 10
11. (Sentiment- & aspect-based) opinion summarization
Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content 11
12. Sentiment analysis: an umbrella term
The Sentiment Analysis task
Is a given text positive, negative, or neutral?
Text = a sentence, a tweet, a customer review, a document …
The Emotion Analysis task
What emotion is being expressed in a given piece of text?
Basic emotions: joy, sadness, fear, anger,…
Other emotions: guilt, pride, optimism, frustration,…
The Aspect-oriented Sentiment Analysis task
What are the product/entity aspects discussed in a text?
What is the sentiment of those aspects?
The Summarization task
What are the key aspects in users’ opinions? What is the predominant
sentiment?
Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content 12
13. Outline
A world of opinions
Analyzing opinions for sentiment
Using sentimental content
13Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
14. Building a sentiment classifier
Building a sentiment classifier requires data and algorithms
14Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
Algorithm
Model
f(x)
15. Challenges of sentiment analysis in social media
Language-related & medium-related challenges
Informal
Short, 140 characters for tweets
Abbreviations and shortenings
Wide array of topics and large vocabulary
Spelling mistakes and creative spellings
Special strings like hashtags, emoticons, conjoined words
Data properties
Large amounts of opinions (Volume)
Continuous flow of opinions (Velocity)
Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content 15
16. Challenges of sentiment analysis in social media
Sentiment-related challenges
The unambiguous identification of sentiment
Sarcasm
Bipolarity
Dealing with colloquial language
tweets containing colloquial slang
Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content 16
17. Building a sentiment classifier
Building a sentiment classifier requires data and algorithms
Two challenging parts
Learning: How to build a classifier?
Labeling: How to create a (class-labeled) training set?
17Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
Algorithm
Model
f(x)
18. How to build a classifier
18Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
Preprocessing part
Negations
Colloquial language
Superfluous words
Emoticons
Learning part/ Classifiers
Naïve Bayes
SVMs
Ensembles
Deep Neural Networks
KNNs
…
19. Preprocessing - Negations
19Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
Tagging negations with verbs
27.222.287 found verb negations (0.4%)
Tagging negations with adjectives
2-part adjective co-occurrences
3-part adjective co-occurrences
4.832.573 found adjective negations (0.1%)
I do not like I NOT_like
It didn't fit It NOT_fit
not pretty ugly
not bad good
not very young old
Verbs negation list: www.vocabulix.com
Adverbs negation list: www.scribd.com
85%
15%
Negation verbs Negation adjectives
Iosifidis & Ntoutsi, “Large scale sentiment learning with limited labels”, KDD 2017
20. Preprocessing effect – Overall view (distinct words)
21Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
0
50.000.000
100.000.000
150.000.000
200.000.000
250.000.000
300.000.000
original slang links & mentions negations Emoticons Stopwords
Iosifidis & Ntoutsi, “Large scale sentiment learning with limited labels”, KDD 2017
21. (back to) Building a sentiment classifier
Building a sentiment classifier requires data and algorithms
Two challenging parts
Learning: How to build a classifier?
Labeling: How to create a (class-labeled) training set?
22Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
Algorithm
Model
f(x)
22. How to create a (class-labeled) training set
Big Data but few labels
Human labelling at this scale is impossible
What other (machine-based) resources can we exploit to label (part of)
our data?
At the data level
Labels through emoticons
Labels through sentiment dictionaries (like SentiWordNet)
At the machine learning model level
use both labeled and unlabeled data for learning semi-supervised learning
23Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
23. Labels through emoticons
Implicit labels, through emoticons
We assembled a list of positive, negative emoticons
#72 positive class emoticons :-) :) :o) =) ;) (: (; (= <3 :D :-D :oD =D ;D
#70 negative emoticons :( :-( :o( =( ;( ;-( ): ); )=
We classified tweets based on their emoticons
Positive only positive emoticons (10%)
Negative only negative emoticons (2%)
Mixed both positive and negative (1%)
No emoticon (88%)
In total, 57.340.286 (12%) are pure-labeled.
24Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
10%
88%
2% 0%
emoticons_positive no_emoticons
emoticons_negative emoticons_mixed
24. Labels through SentiWordNet
SentiWordNet: a lexical resource for supporting sentiment classification
Tweet sentiment as an aggregation of the sentiment of its member words
SentiWordNet labeling results
Positive: only positive words
Negative: only negative words
Neutral: only neutral words
Zero-sum: mix of positive and negative
No decision: words do not exist in the lexicon
e.g., #Iloveobama, #refugeecrisis etc
25Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
25. Emoticons vs SentiWordNet
For the intersection (57.340.286 = 12% tweets with pure sentiment-based labels),
we checked agreement in the labels
Causes of disagreement
Emoticons-based labeling
Prone to errors: existence of positive emoticons does not imply positive words
SentiWordNet-based labeling
SentiWordNet is a static dictionary
Twitter is very dynamic
Words change polarity (also based on context)
New words are created (e.g. hashtags) which are not part of the dictionary
26Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
Emoticon-based
labeling
SentiWordNet-based labeling
Positive Negative Neutral Zero sum No-decision
Positive 28.104.677
(49%)
10.756.225
(19%)
4.908.237
(9%)
23.297
(0.04%)
3.140.978
(5%)
Negative 4.929.947
(9%)
3.885.983
(7%)
930.075
(2%)
7.527
(0.01%)
653.340
(1%)
• We need a hybrid approach:
Campero et al, “Tracking Ephemeral Sentiment
Entities in Social Streams”, submitted 2017
26. Challenges and opportunities
Multilinguality
486.627.464 (English tweets) out of 1.882.387.310 total tweets we utilize
only 26% of the dataset.
Add multilingual content
Transfer learning
Exploit the content similarity
Not everyone uses emoticons
If tweets are similar, “inherit” the sentiment from the “neighboring” tweets
Exploit the hashtags
Start with a seed of positive, negative hashtags
Data augmentation
Iosifidis & Ntoutsi, “Data Augmentation for Polarized Textual Data for Dealing with Class
Imbalance”, Submitted 2017
27Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
27. Challenges and opportunities
Dealing with class imbalance
Most of the opinions/ reviews are positive (5*, respectively). How can we build
models that learn best all classes (not just the majority)?
Dealing with changes
How sentiment changes with time? How can we build classifiers that react to
change (concept drifts)?
Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content 28
28. Reacting to change
29Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
Part of our ongoing work on the OSCAR project
DFG project OSCAR: “Opinion Stream
Classification with Ensembles and Active
leaRners”
29. Outline
A world of opinions
Analyzing opinions for sentiment
Using sentimental content
30Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
30. Changing perspectives: Serving emotional content
Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
"At the constitutional level where we work, 90% of any decision is emotional.
The rational part of us supplies the reasons for supporting our predilections.”
----Justice William O. Douglas
31
32. Emotional appeals
You will be happier, smarter or better looking if you have this item.
33Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
33. The cultural challenge
Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
A case study of FIAT
FIAT released an ad in Italy in which actor
Richard Gere drives a Lancia Delta from
Hollywood to Tibet.
Gere is hated in China for being an
outspoken supporter of the Dalai Lama
There was a huge online uproar on
Chinese message boards commenting that
they would never buy a FIAT car.
34
34. The ephemeral sentiment challenge
Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
Sentiment trajectory for refugees topic
35
Source: Multilingual Sentiment Analysis on Data of the Refugee Crisis in Europe, Shalunts and Backfried, Data Analytics 2016
35. To summarize
Opinions convey more than just information
They comprise a great (and free, most of the times) resource for getting to
know your audience students
You can use opinionated words/ emotions to connect to your audience
students
Many tools for sentiment analysis exist out there (some for free, but also
professional ones)
From an ML point of view
A challenging problem due to language, lack of labeled data, noisy data,
change and context
36Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
36. Thank you! Questions/ Thoughts?
Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content 37
37. Contact
38Prof. Dr. Eirini Ntoutsi: Sentiment Analysis of Social Media Content
Prof Dr. Eirini Ntoutsi
FG Intelligent Systems
Faculty of Electrical Engineering and Computer Science
Leibniz University Hannover & L3S Research Center
http://www.kbs.uni-hannover.de/~ntoutsi/
ntoutsi@l3s.de