SlideShare a Scribd company logo
Introduction:
The present work has been developed with the purpose of participating in the challenge
promoted by Rosette, exemplifying the combined use of RapidMiner software and Rosette
extensions for RapidMiner (see https://www.rosette.com/calling-all-data-scientists/).	
  
	
  
It is worth to point out that none of the presented data or results can be said to be conclusive
regarding the good or bad qualification of the proposed scenario. In fact, the goal of this work
is only to promote the combined use of the two technologies mentioned above. The size of the
sample was not taken into account so that the results represent the qualification of the
proposed scenario.	
  
2	
  
1.  Objectives:
- To analyze the relationship between the current moment of economy and politics in Brazil,
and the international scenario, based on tweets generated in the social network Twitter
(www.twitter.com) in English language, containing the terms "Brazil" and "Michel Temer".	
  
	
  
- The choice of the terms "Brazil" and "Michel Temer" were based on the fact that these two
terms are intrinsically linked to Brazil's political and economic scenario. There are no political
motivations, legal or personal, in this choice.	
  
	
  
1.1 Specific objectives:	
  
	
  
1.1.1: Analyze the statistical correlation between the number of retweets and the perceived
feeling in each of the original tweets (positive, neutral or negative), evaluating which type of
message has the greatest proliferation power among the analyzed tweets, considering the
number of retweets as a factor of proliferation;	
  
	
  
1.1.2: Categorize, search the main entities in specific categories, as well as analyze the
sentiment of each tweet, according to the relevant categories, in order to assess the political
and economic scenario of Brazil, focus of this study.	
  
3	
  
2. Description:
2.1 First stage:
At this stage, using RapidMiner, we have collected tweets in English containing the term "Brazil" and
tweets containing the term "Michel Temer", current president of Brazil. For this stage, two
RapidMiner operators were used: (i) Search on Twitter, which connects Rapid Miner with the Twitter
API and get the tweets with the parameterized query, and (ii) the Write Database operator, which
writes those tweets into a local MySQL-type database. Writing the tweets into a database is
important to accumulate the tweets along several days and, thus, generate a bigger database.
Tweets for the two terms were accumulated from December 1st to December 17th 2016, following
the Twitter API rules of not providing data that is more than a week old.	
  
4	
  
Search
Twitter
Write
Database
RapidMiner Process:
2. Description:
2.2 Second stage:
	
  
At this stage, a RapidMiner process was assembled using the read database operator to access
the database with the stored tweets, followed by the Analyze Sentiment operator of the
Rosette Text Toolkit extension, which, using the external API communication, ranks each tweet
with respect to the Feeling perceived in the text as positive, neutral or negative. After the
feeling classification, the Map operator was inserted to convert the pos, neu or neg outcomes
into numerical parameters 1, 0 or -1, respectively. This conversion is required for the next
Correlation Matrix operator, which requires numeric variables. From the correlation matrix
generated as the final result, we obtain the statistical correlation between the Sentiment
column and the Retweet-Count column. This provides the answers to one of the questions
previously defined as the objective of this study: does the way a message spreads through
retweets depend on its positive, neutral or negative content?	
  
5	
  
2. Description:
2.2 Second stage:
For purposes of understanding, the following definition of statistical correlation is taken from
the description of the RapidMiner Correlation Matrix operator: "A correlation is a number
between -1 and +1 that measures the degree of association between two attributes (call them
X and Y). A positive value for the correlation implies a positive association. In this case large
values of X tend to be associated with large values of Y and small values of X tend to be
associated with small values of Y. A negative value for the correlation implies a negative or
inverse association. In this case large values of X tend to be associated with small values of Y
and vice versa."	
  
6	
  
Read
Database
Analyze
Sentiment
RapidMiner Process:
Map Correlation
Matrix
Rosette Text Toolkit
2. Description:
2.2.1 On the correlation calculations	
  
	
  
The relationship between the classification of news into positive, neutral and negative
contents and the interest of the readers in these stories, especially within the context of
politics, has been subject of long debate over the past years. In Ref. [1], for example, among
other discussions, Trussler and Soroka investigated how negative, positive and neutral news on
the Internet attract the interest of readers. Results of their study suggest that news with
negative headlines are more prone to be selected to be read further, specially when the text
has political content, although this becomes more evident when the topic of the news is on
political strategy. Our study share similarities with [1], but with a social network character: we
analyze how the sentiment (positive/negative/neutral) of contents published in Twitter affect
the number of re-tweets and sharing of these contents. With the large database captured in
the first stage, we also go further and calculate correlation between the sentiment and the
number of re-tweets.
7	
  
2. Description:
2.2.1 On the correlation calculations	
  
	
  
Of course, larger databases, that would provide even more accurate correlation parameters,
can be obtained by adjusting parameters of the RapidMiner tool, or simply by taking a longer
data collection time, so that more tweets are collected.	
  
	
  
[1] M. Trussler and S. Soroka, Consumer Demand for Cynical and Negative News Frames, The
International Journal of Press/Politics 19, 360 (2014).
Just like in [1], we label negative, neutral and positive entries with a sentiment parameter s =
-1, 0, and 1, respectively. This arbitrary choice of parameters does not affect the results of the
correlation calculations, either qualitatively of quantitatively, as one can check by analyzing
e.g. Pearson's correlation coefficient mathematical expression. The only requirement for the
parameter s is to have its value increasing from negative, to neutral, to positive – in this way,
we understand e.g. a negative correlation as being due to the fact that lower (higher) values of
s are more likely to be connected to higher (lower) values of the number of retweets – in other
words, converting s back to the sentiment classifications, negative correlation would mean a
connection between negative (positive) comments and more (less) re-tweets. 	
  
	
  
8	
  
2. Description:
2.2.1 On the correlation calculations	
  
	
  
Therefore, the combination of RapidMiner and Rosette tools allows us to assess the old
question of “how the negative/neutral/positive character of a given political news affects its
impact among readers”, but now using data analysis tools, which are easy to handle and avoid
the need of running experiments that usually involve a number of participants, computer
based surveys, etc.	
  
	
  
[1] M. Trussler and S. Soroka, Consumer Demand for Cynical and Negative News Frames, The
International Journal of Press/Politics 19, 360 (2014). 	
  
	
  
9	
  
2. Description:
2.3 Third Stage
In the third stage, a process was set up with the Read Database operator to access the database with
tweets with the term "Brazil" (for this purpose, a SELECT was used inside the operator, in order to
combine all the tables generated from the Twitter API, one for each day of collection). After this
operator was inserted, the following operators of the Rosette Text Toolkit were included: (i)
Categorize, in order to find all the tweets generated between 1st and 17th of December of 2016
classified by the operator in the category Law, Gov't & Politics; (ii) Extract Entities, which analyzes if
there is a predominance of specific entities in the tweet classified in the category Law, Gov't &
Politics; and (iii) Analyze Sentiment, which analyzes if there is a predominant sentiment
classification, by specific entities found in tweets categorized as Law, Gov't & Politics.	
  
10	
  
Read
Database
Extract
Entities
RapidMiner Process:
Categorize
Analyze
Sentiment
Rosette Text Toolkit
3. Results: second stage
In the first graph, we compare the statistical
correlation obtained during the 17 days of collection
between the volume of retweets and the
classification of feeling in the generated text. It
should be noted that the correlation of the database
with tweets with the term "Brazil" shows a tendency
of dissemination of weak negative tweets, with result
of -0,16. The tweets generated with the term "Michel
Temer" have a correlation with the tendency of
dissemination of negative tweets also weak, with
result of -0.25, although close to the zone of
moderate correlation (above -0.30), with peaks
reaching -0.49 some days, as shown in the following
graphs. 	
  
11	
  
-0,25
-0,16
Michel Temer Brazil
Correlation result by
search term
3. Results: second stage
For the term "Michel Temer", when we analyzed the statistical correlation factors for each of the
collection days, we noticed the negative predominance of correlation indexes. In fact, there is
no case along the analyzed days where positive correlation indexes are observed, i.e., positive
tweets didn't lead to greater number of retweets in any of these days. In six of the seventeen
days analyzed, the negative correlation level even exceeds the -0.30 margin.	
  
12	
  
-0,49
-0,27
-0,36 -0,36
-0,08
-0,18
-0,11
0,01
-0,15
-0,24
-0,35
-0,29
-0,03
-0,26
-0,02
-0,37-0,33
Term: Michel Temer
Correlation per day
3. Results: second stage
As for the term "Brazil", when we analyze the correlation factor per day, we observe a greater
fluctuation of indexes, ranging from a moderate correlation strength for the dissemination of
tweets classified as positive, to a moderate correlation force for proliferation of tweets
classified as negative. Tweets that have the term Brazil involve diverse questions, including
those about politics. As a matter of fact, the collection period was followed by an accident with
the Chapecoense soccer team, which generated a great commotion worldwide, thus somewhat
affecting the statistical results shown here.	
  
13	
  
-0,16
0,1
-0,12
-0,01
0,46
-0,01
0,1
-0,27
-0,42
-0,01 -0,07
-0,16 -0,17
0 0,05
-0,57
0,36
Term: Brazil
Correlation per day
3. Results: second stage
	
  
In an analysis of the tweets collected over the seventeen days for each of the terms, the
percentage of tweets that were classified as negative on each of the bases was checked. Notice
the large volume of negative tweets for the database with the term "Michel Temer". This result
gives us a more complete understanding of the situation and confirms the tendency of a
greater dissemination of tweets classified as negative (volume of retweets), which have the
term "Michel Temer".	
  
14	
  
75%
25%
Percentage of negative tweets per search term
Michel Temer Brazil
3. Results: third stage
The focus of this study is Brazil's current political scenario. The graph below shows the
percentage of all tweets collected during the first 17 days of December that have the term
"Brazil", which were categorized as Law, Gov't & Politics, according to the operator Categorize of
the Rosette Text Toolkit.	
  
15	
  
2%
98%
Distribution by category
Law, Gov't & Politics Others categories
3. Results: third stage
This graph demonstrates the analysis of tweets categorized as Law, Gov't and Politics and their
behavior with respect to the classification of feelings. The operators of Categorize and Analyze
Sentiment were used, showing that in this category, tweets classified as negative are
predominant.	
  
16	
  
20%
18%
62%
Classifier: Law, Gov't & Politics tweets
Pos Neu Neg
3. Results: third stage
The graph below shows the combination of two operators of the Rosette Tool Kit, allowing,
after the process of categorizing the tweets, to extract the most relevant entities of a certain
category. The combination of these operators allows a detailed understanding of the proposed
scenario and can be a powerful tool for decision making in various segments, such as press,
communication or institutional relations.	
  
17	
  
38%
24% 24%
Senate President Supreme Court
Key entities found in tweets categorized as Law, Gov't &
Politics
Key entities found in tweets categorized as Law, Gov't & Politics
4. Conclusions
By making the combined use of RapidMiner with the Rosette Text Toolkit, we have been able to
create a powerful way to analyze data in text, especially social networks. The use of both tools allows
professionals from different areas, who have the need to analyze this type of information, to do so
with low levels of technical knowledge.
Focusing on the result and the analyzes made possible by using the Rosette Text Toolkit gives you
the chance to analyze social media data much more deeply than other tools with standardized
reports. It is possible to continuously create new metrics and think about the process of knowledge
discovery in the database. As an example, we can mention the search for patterns in tweets that are
classified as negative, that belong to a certain category, with the predominance of a specific entity.
In terms of decision-making, the combination of the two tools, with the methodologies developed
in this work, allows real-time evaluation of people's reaction to a given political scenario. Through
these analyzes one can shape institutional campaigns, measure public interests or even model
speeches and other communications in line with people's yearnings.
18	
  
5. Final remarks	
  
This work was developed by Delano Lima, graduated in Advertising and postgraduate in Marketing
Management by UNIFOR - University of Fortaleza. Validation of the process of converting the feeling
classifications from text to numerical data was done in collaboration with Prof. Andrey Chaves, from
the Department of Physics of Universidade Federal do Ceará (UFC), PhD in Physics by UFC and
University of Antwerp, in Belgium, with a post-doc period at Columbia University, USA.	
  
	
  
For any question about this work, please contact Mr. Delano Lima at delano@miningmetrics.net. For
specific questions about the process of converting sentiment classifiers into text to numbers, please
contact Professor Andrey Chaves at andrey@fisica.ufc.br.	
  
19	
  
Are Positive or Negative Tweets More "Retweetable" in Brazilian Politics?

More Related Content

What's hot

Prediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social NetworksPrediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social Networks
Mohamed El-Geish
 
Predicting the Brand Popularity from the Brand Metadata
Predicting the Brand Popularity from the Brand MetadataPredicting the Brand Popularity from the Brand Metadata
Predicting the Brand Popularity from the Brand Metadata
IJECEIAES
 
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNINGDETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
ijcsit
 
IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...
IRJET-  	  Twitter Sentimental Analysis for Predicting Election Result using ...IRJET-  	  Twitter Sentimental Analysis for Predicting Election Result using ...
IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...
IRJET Journal
 
Multiple Regression to Analyse Social Graph of Brand Awareness
Multiple Regression to Analyse Social Graph of Brand AwarenessMultiple Regression to Analyse Social Graph of Brand Awareness
Multiple Regression to Analyse Social Graph of Brand Awareness
TELKOMNIKA JOURNAL
 
FAKE NEWS DETECTION PPT
FAKE NEWS DETECTION PPT FAKE NEWS DETECTION PPT
FAKE NEWS DETECTION PPT
VaishaliSrigadhi
 
DYNAMIC LARGE SCALE DATA ON TWITTER USING SENTIMENT ANALYSIS AND TOPIC MODELING
DYNAMIC LARGE SCALE DATA ON TWITTER USING SENTIMENT ANALYSIS AND TOPIC MODELINGDYNAMIC LARGE SCALE DATA ON TWITTER USING SENTIMENT ANALYSIS AND TOPIC MODELING
DYNAMIC LARGE SCALE DATA ON TWITTER USING SENTIMENT ANALYSIS AND TOPIC MODELING
Andry Alamsyah
 
Safeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist NetworksSafeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist Networks
Parang Saraf
 
merged_document
merged_documentmerged_document
merged_document
ברק יחיא
 
IRJET- Competitive Analysis of Attacks on Social Media
IRJET-  	 Competitive Analysis of Attacks on Social MediaIRJET-  	 Competitive Analysis of Attacks on Social Media
IRJET- Competitive Analysis of Attacks on Social Media
IRJET Journal
 
Epidemiological Modeling of News and Rumors on Twitter
Epidemiological Modeling of News and Rumors on TwitterEpidemiological Modeling of News and Rumors on Twitter
Epidemiological Modeling of News and Rumors on Twitter
Parang Saraf
 
Literature review on customer emotions in social media
Literature review on customer emotions in social mediaLiterature review on customer emotions in social media
Literature review on customer emotions in social media
Jari Jussila
 
Poster presentation in 3rd big data conclave at vit chennai on 20th april 2017
Poster presentation in 3rd big data conclave at vit chennai on 20th april 2017Poster presentation in 3rd big data conclave at vit chennai on 20th april 2017
Poster presentation in 3rd big data conclave at vit chennai on 20th april 2017
Rohit Desai
 
Usability Review of Mashup Tools
Usability Review of Mashup ToolsUsability Review of Mashup Tools
Usability Review of Mashup Tools
Tanya Ahmed
 
IRJET - Election Result Prediction using Sentiment Analysis
IRJET - Election Result Prediction using Sentiment AnalysisIRJET - Election Result Prediction using Sentiment Analysis
IRJET - Election Result Prediction using Sentiment Analysis
IRJET Journal
 
Document(2)
Document(2)Document(2)
Document(2)
Sutha Guru
 
Social Network Analysis for Telecoms
Social Network Analysis for TelecomsSocial Network Analysis for Telecoms
Social Network Analysis for Telecoms
Dataspora
 
Individual project 2.20
Individual project 2.20Individual project 2.20
Individual project 2.20
Monisha100
 
Forex-Foreteller: Currency Trend Modeling using News Articles
Forex-Foreteller: Currency Trend Modeling using News ArticlesForex-Foreteller: Currency Trend Modeling using News Articles
Forex-Foreteller: Currency Trend Modeling using News Articles
Parang Saraf
 
Slides: Epidemiological Modeling of News and Rumors on Twitter
Slides: Epidemiological Modeling of News and Rumors on TwitterSlides: Epidemiological Modeling of News and Rumors on Twitter
Slides: Epidemiological Modeling of News and Rumors on Twitter
Parang Saraf
 

What's hot (20)

Prediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social NetworksPrediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social Networks
 
Predicting the Brand Popularity from the Brand Metadata
Predicting the Brand Popularity from the Brand MetadataPredicting the Brand Popularity from the Brand Metadata
Predicting the Brand Popularity from the Brand Metadata
 
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNINGDETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
 
IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...
IRJET-  	  Twitter Sentimental Analysis for Predicting Election Result using ...IRJET-  	  Twitter Sentimental Analysis for Predicting Election Result using ...
IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...
 
Multiple Regression to Analyse Social Graph of Brand Awareness
Multiple Regression to Analyse Social Graph of Brand AwarenessMultiple Regression to Analyse Social Graph of Brand Awareness
Multiple Regression to Analyse Social Graph of Brand Awareness
 
FAKE NEWS DETECTION PPT
FAKE NEWS DETECTION PPT FAKE NEWS DETECTION PPT
FAKE NEWS DETECTION PPT
 
DYNAMIC LARGE SCALE DATA ON TWITTER USING SENTIMENT ANALYSIS AND TOPIC MODELING
DYNAMIC LARGE SCALE DATA ON TWITTER USING SENTIMENT ANALYSIS AND TOPIC MODELINGDYNAMIC LARGE SCALE DATA ON TWITTER USING SENTIMENT ANALYSIS AND TOPIC MODELING
DYNAMIC LARGE SCALE DATA ON TWITTER USING SENTIMENT ANALYSIS AND TOPIC MODELING
 
Safeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist NetworksSafeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist Networks
 
merged_document
merged_documentmerged_document
merged_document
 
IRJET- Competitive Analysis of Attacks on Social Media
IRJET-  	 Competitive Analysis of Attacks on Social MediaIRJET-  	 Competitive Analysis of Attacks on Social Media
IRJET- Competitive Analysis of Attacks on Social Media
 
Epidemiological Modeling of News and Rumors on Twitter
Epidemiological Modeling of News and Rumors on TwitterEpidemiological Modeling of News and Rumors on Twitter
Epidemiological Modeling of News and Rumors on Twitter
 
Literature review on customer emotions in social media
Literature review on customer emotions in social mediaLiterature review on customer emotions in social media
Literature review on customer emotions in social media
 
Poster presentation in 3rd big data conclave at vit chennai on 20th april 2017
Poster presentation in 3rd big data conclave at vit chennai on 20th april 2017Poster presentation in 3rd big data conclave at vit chennai on 20th april 2017
Poster presentation in 3rd big data conclave at vit chennai on 20th april 2017
 
Usability Review of Mashup Tools
Usability Review of Mashup ToolsUsability Review of Mashup Tools
Usability Review of Mashup Tools
 
IRJET - Election Result Prediction using Sentiment Analysis
IRJET - Election Result Prediction using Sentiment AnalysisIRJET - Election Result Prediction using Sentiment Analysis
IRJET - Election Result Prediction using Sentiment Analysis
 
Document(2)
Document(2)Document(2)
Document(2)
 
Social Network Analysis for Telecoms
Social Network Analysis for TelecomsSocial Network Analysis for Telecoms
Social Network Analysis for Telecoms
 
Individual project 2.20
Individual project 2.20Individual project 2.20
Individual project 2.20
 
Forex-Foreteller: Currency Trend Modeling using News Articles
Forex-Foreteller: Currency Trend Modeling using News ArticlesForex-Foreteller: Currency Trend Modeling using News Articles
Forex-Foreteller: Currency Trend Modeling using News Articles
 
Slides: Epidemiological Modeling of News and Rumors on Twitter
Slides: Epidemiological Modeling of News and Rumors on TwitterSlides: Epidemiological Modeling of News and Rumors on Twitter
Slides: Epidemiological Modeling of News and Rumors on Twitter
 

Similar to Are Positive or Negative Tweets More "Retweetable" in Brazilian Politics?

Analyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-TweetsAnalyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-Tweets
RESHAN FARAZ
 
1 Crore Projects | ieee 2016 Projects | 2016 ieee Projects in chennai
1 Crore Projects | ieee 2016 Projects | 2016 ieee Projects in chennai1 Crore Projects | ieee 2016 Projects | 2016 ieee Projects in chennai
1 Crore Projects | ieee 2016 Projects | 2016 ieee Projects in chennai
1crore projects
 
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITYFRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
cscpconf
 
Stock market prediction using Twitter sentiment analysis
Stock market prediction using Twitter sentiment analysisStock market prediction using Twitter sentiment analysis
Stock market prediction using Twitter sentiment analysis
IJRTEMJOURNAL
 
Twitter sentimentanalysis report
Twitter sentimentanalysis reportTwitter sentimentanalysis report
Twitter sentimentanalysis report
Savio Aberneithie
 
A large-scale sentiment analysis using political tweets
A large-scale sentiment analysis using political tweetsA large-scale sentiment analysis using political tweets
A large-scale sentiment analysis using political tweets
IJECEIAES
 
Kushin (2018) review of Meltwater, Journal of Public Relations Education, Vol...
Kushin (2018) review of Meltwater, Journal of Public Relations Education, Vol...Kushin (2018) review of Meltwater, Journal of Public Relations Education, Vol...
Kushin (2018) review of Meltwater, Journal of Public Relations Education, Vol...
AEJMC Journal of Public Relations Education (JPRE)
 
Using Social Media to Measure the Consumer Confidence: The Twitter Case in Spain
Using Social Media to Measure the Consumer Confidence: The Twitter Case in SpainUsing Social Media to Measure the Consumer Confidence: The Twitter Case in Spain
Using Social Media to Measure the Consumer Confidence: The Twitter Case in Spain
Manu García
 
757
757757
591 Final Report - Team 7 - Political Issues
591 Final Report - Team 7 - Political Issues591 Final Report - Team 7 - Political Issues
591 Final Report - Team 7 - Political Issues
Tim Sawicki
 
Text mining on Twitter information based on R platform
Text mining on Twitter information based on R platformText mining on Twitter information based on R platform
Text mining on Twitter information based on R platform
Fayan TAO
 
[IJET-V2I1P14] Authors:Aditi Verma, Rachana Agarwal, Sameer Bardia, Simran Sh...
[IJET-V2I1P14] Authors:Aditi Verma, Rachana Agarwal, Sameer Bardia, Simran Sh...[IJET-V2I1P14] Authors:Aditi Verma, Rachana Agarwal, Sameer Bardia, Simran Sh...
[IJET-V2I1P14] Authors:Aditi Verma, Rachana Agarwal, Sameer Bardia, Simran Sh...
IJET - International Journal of Engineering and Techniques
 
Big data analysis of news and social media content
Big data analysis of news and social media contentBig data analysis of news and social media content
Big data analysis of news and social media content
Firas Husseini
 
Final Poster for Engineering Showcase
Final Poster for Engineering ShowcaseFinal Poster for Engineering Showcase
Final Poster for Engineering Showcase
Tucker Truesdale
 
IRJET - Implementation of Twitter Sentimental Analysis According to Hash Tag
 IRJET - Implementation of Twitter Sentimental Analysis According to Hash Tag IRJET - Implementation of Twitter Sentimental Analysis According to Hash Tag
IRJET - Implementation of Twitter Sentimental Analysis According to Hash Tag
IRJET Journal
 
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2VecIRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET Journal
 
IRJET - Political Orientation Prediction using Social Media Activity
IRJET -  	  Political Orientation Prediction using Social Media ActivityIRJET -  	  Political Orientation Prediction using Social Media Activity
IRJET - Political Orientation Prediction using Social Media Activity
IRJET Journal
 
Running head DEPRESSION PREDICTION DRAFT1DEPRESSION PREDICTI.docx
Running head DEPRESSION PREDICTION DRAFT1DEPRESSION PREDICTI.docxRunning head DEPRESSION PREDICTION DRAFT1DEPRESSION PREDICTI.docx
Running head DEPRESSION PREDICTION DRAFT1DEPRESSION PREDICTI.docx
healdkathaleen
 
IRJET- An Experimental Evaluation of Mechanical Properties of Bamboo Fiber Re...
IRJET- An Experimental Evaluation of Mechanical Properties of Bamboo Fiber Re...IRJET- An Experimental Evaluation of Mechanical Properties of Bamboo Fiber Re...
IRJET- An Experimental Evaluation of Mechanical Properties of Bamboo Fiber Re...
IRJET Journal
 
IRJET- Tweet Segmentation and its Application to Named Entity Recognition
IRJET- Tweet Segmentation and its Application to Named Entity RecognitionIRJET- Tweet Segmentation and its Application to Named Entity Recognition
IRJET- Tweet Segmentation and its Application to Named Entity Recognition
IRJET Journal
 

Similar to Are Positive or Negative Tweets More "Retweetable" in Brazilian Politics? (20)

Analyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-TweetsAnalyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-Tweets
 
1 Crore Projects | ieee 2016 Projects | 2016 ieee Projects in chennai
1 Crore Projects | ieee 2016 Projects | 2016 ieee Projects in chennai1 Crore Projects | ieee 2016 Projects | 2016 ieee Projects in chennai
1 Crore Projects | ieee 2016 Projects | 2016 ieee Projects in chennai
 
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITYFRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
 
Stock market prediction using Twitter sentiment analysis
Stock market prediction using Twitter sentiment analysisStock market prediction using Twitter sentiment analysis
Stock market prediction using Twitter sentiment analysis
 
Twitter sentimentanalysis report
Twitter sentimentanalysis reportTwitter sentimentanalysis report
Twitter sentimentanalysis report
 
A large-scale sentiment analysis using political tweets
A large-scale sentiment analysis using political tweetsA large-scale sentiment analysis using political tweets
A large-scale sentiment analysis using political tweets
 
Kushin (2018) review of Meltwater, Journal of Public Relations Education, Vol...
Kushin (2018) review of Meltwater, Journal of Public Relations Education, Vol...Kushin (2018) review of Meltwater, Journal of Public Relations Education, Vol...
Kushin (2018) review of Meltwater, Journal of Public Relations Education, Vol...
 
Using Social Media to Measure the Consumer Confidence: The Twitter Case in Spain
Using Social Media to Measure the Consumer Confidence: The Twitter Case in SpainUsing Social Media to Measure the Consumer Confidence: The Twitter Case in Spain
Using Social Media to Measure the Consumer Confidence: The Twitter Case in Spain
 
757
757757
757
 
591 Final Report - Team 7 - Political Issues
591 Final Report - Team 7 - Political Issues591 Final Report - Team 7 - Political Issues
591 Final Report - Team 7 - Political Issues
 
Text mining on Twitter information based on R platform
Text mining on Twitter information based on R platformText mining on Twitter information based on R platform
Text mining on Twitter information based on R platform
 
[IJET-V2I1P14] Authors:Aditi Verma, Rachana Agarwal, Sameer Bardia, Simran Sh...
[IJET-V2I1P14] Authors:Aditi Verma, Rachana Agarwal, Sameer Bardia, Simran Sh...[IJET-V2I1P14] Authors:Aditi Verma, Rachana Agarwal, Sameer Bardia, Simran Sh...
[IJET-V2I1P14] Authors:Aditi Verma, Rachana Agarwal, Sameer Bardia, Simran Sh...
 
Big data analysis of news and social media content
Big data analysis of news and social media contentBig data analysis of news and social media content
Big data analysis of news and social media content
 
Final Poster for Engineering Showcase
Final Poster for Engineering ShowcaseFinal Poster for Engineering Showcase
Final Poster for Engineering Showcase
 
IRJET - Implementation of Twitter Sentimental Analysis According to Hash Tag
 IRJET - Implementation of Twitter Sentimental Analysis According to Hash Tag IRJET - Implementation of Twitter Sentimental Analysis According to Hash Tag
IRJET - Implementation of Twitter Sentimental Analysis According to Hash Tag
 
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2VecIRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
 
IRJET - Political Orientation Prediction using Social Media Activity
IRJET -  	  Political Orientation Prediction using Social Media ActivityIRJET -  	  Political Orientation Prediction using Social Media Activity
IRJET - Political Orientation Prediction using Social Media Activity
 
Running head DEPRESSION PREDICTION DRAFT1DEPRESSION PREDICTI.docx
Running head DEPRESSION PREDICTION DRAFT1DEPRESSION PREDICTI.docxRunning head DEPRESSION PREDICTION DRAFT1DEPRESSION PREDICTI.docx
Running head DEPRESSION PREDICTION DRAFT1DEPRESSION PREDICTI.docx
 
IRJET- An Experimental Evaluation of Mechanical Properties of Bamboo Fiber Re...
IRJET- An Experimental Evaluation of Mechanical Properties of Bamboo Fiber Re...IRJET- An Experimental Evaluation of Mechanical Properties of Bamboo Fiber Re...
IRJET- An Experimental Evaluation of Mechanical Properties of Bamboo Fiber Re...
 
IRJET- Tweet Segmentation and its Application to Named Entity Recognition
IRJET- Tweet Segmentation and its Application to Named Entity RecognitionIRJET- Tweet Segmentation and its Application to Named Entity Recognition
IRJET- Tweet Segmentation and its Application to Named Entity Recognition
 

Recently uploaded

How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 

Recently uploaded (20)

How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 

Are Positive or Negative Tweets More "Retweetable" in Brazilian Politics?

  • 1.
  • 2. Introduction: The present work has been developed with the purpose of participating in the challenge promoted by Rosette, exemplifying the combined use of RapidMiner software and Rosette extensions for RapidMiner (see https://www.rosette.com/calling-all-data-scientists/).     It is worth to point out that none of the presented data or results can be said to be conclusive regarding the good or bad qualification of the proposed scenario. In fact, the goal of this work is only to promote the combined use of the two technologies mentioned above. The size of the sample was not taken into account so that the results represent the qualification of the proposed scenario.   2  
  • 3. 1.  Objectives: - To analyze the relationship between the current moment of economy and politics in Brazil, and the international scenario, based on tweets generated in the social network Twitter (www.twitter.com) in English language, containing the terms "Brazil" and "Michel Temer".     - The choice of the terms "Brazil" and "Michel Temer" were based on the fact that these two terms are intrinsically linked to Brazil's political and economic scenario. There are no political motivations, legal or personal, in this choice.     1.1 Specific objectives:     1.1.1: Analyze the statistical correlation between the number of retweets and the perceived feeling in each of the original tweets (positive, neutral or negative), evaluating which type of message has the greatest proliferation power among the analyzed tweets, considering the number of retweets as a factor of proliferation;     1.1.2: Categorize, search the main entities in specific categories, as well as analyze the sentiment of each tweet, according to the relevant categories, in order to assess the political and economic scenario of Brazil, focus of this study.   3  
  • 4. 2. Description: 2.1 First stage: At this stage, using RapidMiner, we have collected tweets in English containing the term "Brazil" and tweets containing the term "Michel Temer", current president of Brazil. For this stage, two RapidMiner operators were used: (i) Search on Twitter, which connects Rapid Miner with the Twitter API and get the tweets with the parameterized query, and (ii) the Write Database operator, which writes those tweets into a local MySQL-type database. Writing the tweets into a database is important to accumulate the tweets along several days and, thus, generate a bigger database. Tweets for the two terms were accumulated from December 1st to December 17th 2016, following the Twitter API rules of not providing data that is more than a week old.   4   Search Twitter Write Database RapidMiner Process:
  • 5. 2. Description: 2.2 Second stage:   At this stage, a RapidMiner process was assembled using the read database operator to access the database with the stored tweets, followed by the Analyze Sentiment operator of the Rosette Text Toolkit extension, which, using the external API communication, ranks each tweet with respect to the Feeling perceived in the text as positive, neutral or negative. After the feeling classification, the Map operator was inserted to convert the pos, neu or neg outcomes into numerical parameters 1, 0 or -1, respectively. This conversion is required for the next Correlation Matrix operator, which requires numeric variables. From the correlation matrix generated as the final result, we obtain the statistical correlation between the Sentiment column and the Retweet-Count column. This provides the answers to one of the questions previously defined as the objective of this study: does the way a message spreads through retweets depend on its positive, neutral or negative content?   5  
  • 6. 2. Description: 2.2 Second stage: For purposes of understanding, the following definition of statistical correlation is taken from the description of the RapidMiner Correlation Matrix operator: "A correlation is a number between -1 and +1 that measures the degree of association between two attributes (call them X and Y). A positive value for the correlation implies a positive association. In this case large values of X tend to be associated with large values of Y and small values of X tend to be associated with small values of Y. A negative value for the correlation implies a negative or inverse association. In this case large values of X tend to be associated with small values of Y and vice versa."   6   Read Database Analyze Sentiment RapidMiner Process: Map Correlation Matrix Rosette Text Toolkit
  • 7. 2. Description: 2.2.1 On the correlation calculations     The relationship between the classification of news into positive, neutral and negative contents and the interest of the readers in these stories, especially within the context of politics, has been subject of long debate over the past years. In Ref. [1], for example, among other discussions, Trussler and Soroka investigated how negative, positive and neutral news on the Internet attract the interest of readers. Results of their study suggest that news with negative headlines are more prone to be selected to be read further, specially when the text has political content, although this becomes more evident when the topic of the news is on political strategy. Our study share similarities with [1], but with a social network character: we analyze how the sentiment (positive/negative/neutral) of contents published in Twitter affect the number of re-tweets and sharing of these contents. With the large database captured in the first stage, we also go further and calculate correlation between the sentiment and the number of re-tweets. 7  
  • 8. 2. Description: 2.2.1 On the correlation calculations     Of course, larger databases, that would provide even more accurate correlation parameters, can be obtained by adjusting parameters of the RapidMiner tool, or simply by taking a longer data collection time, so that more tweets are collected.     [1] M. Trussler and S. Soroka, Consumer Demand for Cynical and Negative News Frames, The International Journal of Press/Politics 19, 360 (2014). Just like in [1], we label negative, neutral and positive entries with a sentiment parameter s = -1, 0, and 1, respectively. This arbitrary choice of parameters does not affect the results of the correlation calculations, either qualitatively of quantitatively, as one can check by analyzing e.g. Pearson's correlation coefficient mathematical expression. The only requirement for the parameter s is to have its value increasing from negative, to neutral, to positive – in this way, we understand e.g. a negative correlation as being due to the fact that lower (higher) values of s are more likely to be connected to higher (lower) values of the number of retweets – in other words, converting s back to the sentiment classifications, negative correlation would mean a connection between negative (positive) comments and more (less) re-tweets.     8  
  • 9. 2. Description: 2.2.1 On the correlation calculations     Therefore, the combination of RapidMiner and Rosette tools allows us to assess the old question of “how the negative/neutral/positive character of a given political news affects its impact among readers”, but now using data analysis tools, which are easy to handle and avoid the need of running experiments that usually involve a number of participants, computer based surveys, etc.     [1] M. Trussler and S. Soroka, Consumer Demand for Cynical and Negative News Frames, The International Journal of Press/Politics 19, 360 (2014).     9  
  • 10. 2. Description: 2.3 Third Stage In the third stage, a process was set up with the Read Database operator to access the database with tweets with the term "Brazil" (for this purpose, a SELECT was used inside the operator, in order to combine all the tables generated from the Twitter API, one for each day of collection). After this operator was inserted, the following operators of the Rosette Text Toolkit were included: (i) Categorize, in order to find all the tweets generated between 1st and 17th of December of 2016 classified by the operator in the category Law, Gov't & Politics; (ii) Extract Entities, which analyzes if there is a predominance of specific entities in the tweet classified in the category Law, Gov't & Politics; and (iii) Analyze Sentiment, which analyzes if there is a predominant sentiment classification, by specific entities found in tweets categorized as Law, Gov't & Politics.   10   Read Database Extract Entities RapidMiner Process: Categorize Analyze Sentiment Rosette Text Toolkit
  • 11. 3. Results: second stage In the first graph, we compare the statistical correlation obtained during the 17 days of collection between the volume of retweets and the classification of feeling in the generated text. It should be noted that the correlation of the database with tweets with the term "Brazil" shows a tendency of dissemination of weak negative tweets, with result of -0,16. The tweets generated with the term "Michel Temer" have a correlation with the tendency of dissemination of negative tweets also weak, with result of -0.25, although close to the zone of moderate correlation (above -0.30), with peaks reaching -0.49 some days, as shown in the following graphs.   11   -0,25 -0,16 Michel Temer Brazil Correlation result by search term
  • 12. 3. Results: second stage For the term "Michel Temer", when we analyzed the statistical correlation factors for each of the collection days, we noticed the negative predominance of correlation indexes. In fact, there is no case along the analyzed days where positive correlation indexes are observed, i.e., positive tweets didn't lead to greater number of retweets in any of these days. In six of the seventeen days analyzed, the negative correlation level even exceeds the -0.30 margin.   12   -0,49 -0,27 -0,36 -0,36 -0,08 -0,18 -0,11 0,01 -0,15 -0,24 -0,35 -0,29 -0,03 -0,26 -0,02 -0,37-0,33 Term: Michel Temer Correlation per day
  • 13. 3. Results: second stage As for the term "Brazil", when we analyze the correlation factor per day, we observe a greater fluctuation of indexes, ranging from a moderate correlation strength for the dissemination of tweets classified as positive, to a moderate correlation force for proliferation of tweets classified as negative. Tweets that have the term Brazil involve diverse questions, including those about politics. As a matter of fact, the collection period was followed by an accident with the Chapecoense soccer team, which generated a great commotion worldwide, thus somewhat affecting the statistical results shown here.   13   -0,16 0,1 -0,12 -0,01 0,46 -0,01 0,1 -0,27 -0,42 -0,01 -0,07 -0,16 -0,17 0 0,05 -0,57 0,36 Term: Brazil Correlation per day
  • 14. 3. Results: second stage   In an analysis of the tweets collected over the seventeen days for each of the terms, the percentage of tweets that were classified as negative on each of the bases was checked. Notice the large volume of negative tweets for the database with the term "Michel Temer". This result gives us a more complete understanding of the situation and confirms the tendency of a greater dissemination of tweets classified as negative (volume of retweets), which have the term "Michel Temer".   14   75% 25% Percentage of negative tweets per search term Michel Temer Brazil
  • 15. 3. Results: third stage The focus of this study is Brazil's current political scenario. The graph below shows the percentage of all tweets collected during the first 17 days of December that have the term "Brazil", which were categorized as Law, Gov't & Politics, according to the operator Categorize of the Rosette Text Toolkit.   15   2% 98% Distribution by category Law, Gov't & Politics Others categories
  • 16. 3. Results: third stage This graph demonstrates the analysis of tweets categorized as Law, Gov't and Politics and their behavior with respect to the classification of feelings. The operators of Categorize and Analyze Sentiment were used, showing that in this category, tweets classified as negative are predominant.   16   20% 18% 62% Classifier: Law, Gov't & Politics tweets Pos Neu Neg
  • 17. 3. Results: third stage The graph below shows the combination of two operators of the Rosette Tool Kit, allowing, after the process of categorizing the tweets, to extract the most relevant entities of a certain category. The combination of these operators allows a detailed understanding of the proposed scenario and can be a powerful tool for decision making in various segments, such as press, communication or institutional relations.   17   38% 24% 24% Senate President Supreme Court Key entities found in tweets categorized as Law, Gov't & Politics Key entities found in tweets categorized as Law, Gov't & Politics
  • 18. 4. Conclusions By making the combined use of RapidMiner with the Rosette Text Toolkit, we have been able to create a powerful way to analyze data in text, especially social networks. The use of both tools allows professionals from different areas, who have the need to analyze this type of information, to do so with low levels of technical knowledge. Focusing on the result and the analyzes made possible by using the Rosette Text Toolkit gives you the chance to analyze social media data much more deeply than other tools with standardized reports. It is possible to continuously create new metrics and think about the process of knowledge discovery in the database. As an example, we can mention the search for patterns in tweets that are classified as negative, that belong to a certain category, with the predominance of a specific entity. In terms of decision-making, the combination of the two tools, with the methodologies developed in this work, allows real-time evaluation of people's reaction to a given political scenario. Through these analyzes one can shape institutional campaigns, measure public interests or even model speeches and other communications in line with people's yearnings. 18  
  • 19. 5. Final remarks   This work was developed by Delano Lima, graduated in Advertising and postgraduate in Marketing Management by UNIFOR - University of Fortaleza. Validation of the process of converting the feeling classifications from text to numerical data was done in collaboration with Prof. Andrey Chaves, from the Department of Physics of Universidade Federal do Ceará (UFC), PhD in Physics by UFC and University of Antwerp, in Belgium, with a post-doc period at Columbia University, USA.     For any question about this work, please contact Mr. Delano Lima at delano@miningmetrics.net. For specific questions about the process of converting sentiment classifiers into text to numbers, please contact Professor Andrey Chaves at andrey@fisica.ufc.br.   19