This document summarizes an analysis of tweets containing the terms "Brazil" and "Michel Temer" to understand the political and economic scenario in Brazil. RapidMiner was used to collect tweets over 17 days and the Rosette Text Toolkit categorized tweets and analyzed sentiment. For "Michel Temer", there was a weak to moderate negative correlation between sentiment and retweets, and 75% of tweets were negative. For tweets about "Brazil" categorized as law/politics, 62% were negative and the most mentioned entities were the Senate, President, and Supreme Court. The analysis demonstrates how RapidMiner and Rosette can be used together to understand sentiment in social media posts about political topics.
Evolving social data mining and affective analysis Athena Vakali
Evolving social data mining and affective analysis methodologies, framework and applications - Web 2.0 facts and social data
Social associations and all kinds of graphs
Evolving social data mining
Emotion-aware social data analysis
Frameworks and Applications
Stock market prediction using Twitter sentiment analysisjournal ijrtem
ABSTRACT : In a study, it was investigated relationship among stock market movement and Tweeter feed content. We are expecting to see if there is connection among sentiment information extracted from the Tweets using a Vader in predicting movements of stock prices. As a result it was obtained strong positive correlation with a coefficient of correlation to be 0.7815.
KEYWORDS : correlation, financial market, polarity, sentiment analysis, tweets
A Survey Of Collaborative Filtering Techniquestengyue5i5j
This document provides a survey of collaborative filtering techniques. It begins with an introduction to collaborative filtering and its main challenges, such as data sparsity, scalability, and synonymy. It then describes three main categories of collaborative filtering techniques: memory-based, model-based, and hybrid approaches. Representative algorithms from each category are discussed and analyzed in terms of their predictive performance and ability to address collaborative filtering challenges. The document concludes with a discussion of evaluating collaborative filtering algorithms and commonly used datasets.
IRJET- Big Data Driven Information Diffusion Analytics and Control on Social ...IRJET Journal
This document discusses controlling the spread of fake or misleading information on social media. It proposes a system to analyze information diffusion on social networks, identify diffused data, and control the spread of fake diffused data. The system would extract data from social media, perform sentiment analysis to determine the veracity of information, and discard fake or untrustworthy information from the database to prevent further propagation. A variety of machine learning techniques could be used for the sentiment analysis, including naive Bayes classification, linear regression, and gradient boosted trees. The goal is to curb the spread of misinformation while still allowing the diffusion of real or truthful information.
The document proposes a multi-tier sentiment analysis system called MSABDP to analyze large-scale social media data more efficiently. MSABDP uses Hadoop for its distributed processing and storage capabilities. It collects Twitter data using Apache Flume and stores it in HDFS. It then applies a multi-tier classification approach combining lexicon-based and machine learning techniques to classify tweets into multiple sentiment classes, reducing complexity compared to single-tier architectures. Evaluation on real Twitter data showed MSABDP improved classification accuracy over single-tier approaches by 7%.
Temporal Exploration in 2D Visualization of Emotions on Twitter StreamTELKOMNIKA JOURNAL
This document presents a system for visualizing emotions expressed on Twitter streams over time from different geographic locations. The system collects tweets from the US, Japan, Indonesia, and Taiwan related to iPhones and performs sentiment analysis using naive Bayes classification. It then visualizes the results using two-dimensional heat maps, interactive stream graphs, and context focus brushing. The visualizations allow exploration of temporal patterns in customer emotional behavior across locations expressed in Twitter data.
EMBERS AutoGSR: Automated Coding of Civil Unrest EventsParang Saraf
Abstract: We describe the EMBERS AutoGSR system that conducts automated coding of civil unrest events from news articles published in multiple languages. The nuts and bolts of the AutoGSR system constitute an ecosystem of filtering, ranking, and recommendation models to determine if an article reports a civil unrest event and, if so, proceed to identify and encode specific characteristics of the civil unrest event such as the when, where, who, and why of the protest. AutoGSR is a deployed system for the past 6 months continually processing data 24x7 in languages such as Spanish, Portuguese, English and encoding civil unrest events in 10 countries of Latin America: Argentina, Brazil, Chile, Colombia, Ecuador, El Salvador, Mexico, Paraguay, Uruguay, and Venezuela. We demonstrate the superiority of AutoGSR over both manual approaches and other state-of-the-art en- coding systems for civil unrest.
For more information, please visit: http://people.cs.vt.edu/parang/ or contact parang at firstname at cs vt edu
This document discusses mining social data from graphs by extracting the data from social networks using their APIs or by crawling websites. It describes representing the social data as graphs and using graph mining techniques like finding frequent patterns and substructures using algorithms like Apriori, pattern growth, and CL-CBI (ChunkingLess - Constraint-Based Induction). Decision trees can also be used to iteratively find patterns that branch the data. The challenges include the graph nature of the data, errors and unknowns, and vanity metrics, but graphs are useful for capturing complex social structures.
Evolving social data mining and affective analysis Athena Vakali
Evolving social data mining and affective analysis methodologies, framework and applications - Web 2.0 facts and social data
Social associations and all kinds of graphs
Evolving social data mining
Emotion-aware social data analysis
Frameworks and Applications
Stock market prediction using Twitter sentiment analysisjournal ijrtem
ABSTRACT : In a study, it was investigated relationship among stock market movement and Tweeter feed content. We are expecting to see if there is connection among sentiment information extracted from the Tweets using a Vader in predicting movements of stock prices. As a result it was obtained strong positive correlation with a coefficient of correlation to be 0.7815.
KEYWORDS : correlation, financial market, polarity, sentiment analysis, tweets
A Survey Of Collaborative Filtering Techniquestengyue5i5j
This document provides a survey of collaborative filtering techniques. It begins with an introduction to collaborative filtering and its main challenges, such as data sparsity, scalability, and synonymy. It then describes three main categories of collaborative filtering techniques: memory-based, model-based, and hybrid approaches. Representative algorithms from each category are discussed and analyzed in terms of their predictive performance and ability to address collaborative filtering challenges. The document concludes with a discussion of evaluating collaborative filtering algorithms and commonly used datasets.
IRJET- Big Data Driven Information Diffusion Analytics and Control on Social ...IRJET Journal
This document discusses controlling the spread of fake or misleading information on social media. It proposes a system to analyze information diffusion on social networks, identify diffused data, and control the spread of fake diffused data. The system would extract data from social media, perform sentiment analysis to determine the veracity of information, and discard fake or untrustworthy information from the database to prevent further propagation. A variety of machine learning techniques could be used for the sentiment analysis, including naive Bayes classification, linear regression, and gradient boosted trees. The goal is to curb the spread of misinformation while still allowing the diffusion of real or truthful information.
The document proposes a multi-tier sentiment analysis system called MSABDP to analyze large-scale social media data more efficiently. MSABDP uses Hadoop for its distributed processing and storage capabilities. It collects Twitter data using Apache Flume and stores it in HDFS. It then applies a multi-tier classification approach combining lexicon-based and machine learning techniques to classify tweets into multiple sentiment classes, reducing complexity compared to single-tier architectures. Evaluation on real Twitter data showed MSABDP improved classification accuracy over single-tier approaches by 7%.
Temporal Exploration in 2D Visualization of Emotions on Twitter StreamTELKOMNIKA JOURNAL
This document presents a system for visualizing emotions expressed on Twitter streams over time from different geographic locations. The system collects tweets from the US, Japan, Indonesia, and Taiwan related to iPhones and performs sentiment analysis using naive Bayes classification. It then visualizes the results using two-dimensional heat maps, interactive stream graphs, and context focus brushing. The visualizations allow exploration of temporal patterns in customer emotional behavior across locations expressed in Twitter data.
EMBERS AutoGSR: Automated Coding of Civil Unrest EventsParang Saraf
Abstract: We describe the EMBERS AutoGSR system that conducts automated coding of civil unrest events from news articles published in multiple languages. The nuts and bolts of the AutoGSR system constitute an ecosystem of filtering, ranking, and recommendation models to determine if an article reports a civil unrest event and, if so, proceed to identify and encode specific characteristics of the civil unrest event such as the when, where, who, and why of the protest. AutoGSR is a deployed system for the past 6 months continually processing data 24x7 in languages such as Spanish, Portuguese, English and encoding civil unrest events in 10 countries of Latin America: Argentina, Brazil, Chile, Colombia, Ecuador, El Salvador, Mexico, Paraguay, Uruguay, and Venezuela. We demonstrate the superiority of AutoGSR over both manual approaches and other state-of-the-art en- coding systems for civil unrest.
For more information, please visit: http://people.cs.vt.edu/parang/ or contact parang at firstname at cs vt edu
This document discusses mining social data from graphs by extracting the data from social networks using their APIs or by crawling websites. It describes representing the social data as graphs and using graph mining techniques like finding frequent patterns and substructures using algorithms like Apriori, pattern growth, and CL-CBI (ChunkingLess - Constraint-Based Induction). Decision trees can also be used to iteratively find patterns that branch the data. The challenges include the graph nature of the data, errors and unknowns, and vanity metrics, but graphs are useful for capturing complex social structures.
Prediction of Reaction towards Textual Posts in Social NetworksMohamed El-Geish
Posting on social networks could be a gratifying or a terrifying experience depending on the reaction the post and its author —by association— receive from the readers. To better understand what makes a post popular, this project inquires into the factors that determine the number of likes, comments, and shares a textual post gets on LinkedIn; and finds a predictor function that can estimate those quantitative social gestures.
Predicting the Brand Popularity from the Brand MetadataIJECEIAES
The document presents a framework for predicting brand popularity from brand metadata on social networks. It identifies thoughtful comments from brand posts using natural language processing and classifies them as favorable or unfavorable. Brand metadata like numbers of likes, shares, and identified thoughtful comments are then combined to forecast future brand popularity. The performance of the proposed framework is evaluated against recent works in terms of thoughtful comment identification accuracy, execution time, popularity prediction accuracy, and prediction time. Results show improvements over existing approaches.
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNINGijcsit
With the advent of the Internet and social media, while hundreds of people have benefitted from the vast sources of information available, there has been an enormous increase in the rise of cyber-crimes, particularly targeted towards women. According to a 2019 report in the [4] Economics Times, India has witnessed a 457% rise in cybercrime in the five year span between 2011 and 2016. Most speculate that this is due to impact of social media such as Facebook, Instagram and Twitter on our daily lives. While these definitely help in creating a sound social network, creation of user accounts in these sites usually needs just an email-id. A real life person can create multiple fake IDs and hence impostors can easily be made. Unlike the real world scenario where multiple rules and regulations are imposed to identify oneself in a unique manner (for example while issuing one’s passport or driver’s license), in the virtual world of social media, admission does not require any such checks. In this paper, we study the different accounts of Instagram, in particular and try to assess an account as fake or real using Machine Learning techniques namely Logistic Regression and Random Forest Algorithm.
IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...IRJET Journal
This document proposes using Twitter sentiment analysis and an LSTM neural network to predict election results. It involves collecting tweets related to political parties and candidates in India, cleaning the data, training an LSTM classifier on labeled tweets, and using the trained model to classify tweets as positive, negative or neutral sentiment and compare sentiment levels for each candidate. The goal is to analyze public sentiment expressed on Twitter and how it correlates with actual election outcomes.
Multiple Regression to Analyse Social Graph of Brand AwarenessTELKOMNIKA JOURNAL
Social Network Analysis (SNA) has become a common tool to conduct social and business
research. SNA can be used to measure how well a marketing campaign affect conversation in social
media. A good marketing campaign is expected to stimulate conversation between users in social media.
In this paper we use SNA metrics to understand the nature of network of top brand awareness products.
We analyses networks structure of social media conversation regarding cellular service provider and
smartphone brand in Indonesia that achieve top brand awareness in 2015. We use conversational
datasets acquired from Twitter. To get more understanding we also compare the result with network
structure of knowledge dissemination. We use multiple regression algorithm, a machine learning algorithm
that is extension of linear regression, to analyses network properties to get insight on the correlation of the
network structure and brand awareness' rank of a product. The result suggests how we should define
network properties in brand awareness context.
DYNAMIC LARGE SCALE DATA ON TWITTER USING SENTIMENT ANALYSIS AND TOPIC MODELINGAndry Alamsyah
1. The document presents a case study analyzing tweets about Uber using sentiment analysis and topic modeling to understand public opinion from large-scale social media data.
2. Sentiment analysis classified tweets as positive, negative, or neutral, while topic modeling identified dominant topics of discussion, like promotions or driver complaints.
3. The analyses found that positive tweets often discussed promotions while negative tweets addressed issues like sexual harassment allegations or unsatisfactory drivers.
Abstract: This paper introduces a system for visual analysis of news articles and emails. The system was developed in response to VAST MiniChallenge 1 and comprises different interfaces for mining textual data and network data.
For more information, please visit: http://people.cs.vt.edu/parang/ or contact parang at firstname at cs vt edu
This thesis examines an unsupervised approach to classifying users in online social networks using only simple statistics about users' behavior. The author applies sparse principal component analysis (SPCA) to Twitter data without using text or profile content. Key contributions include:
1. Demonstrating that meaningful user classification is possible using only statistics on network structure and communication patterns.
2. Developing a "semantic robustness" score to evaluate how well classifications retain meaning when reanalyzing subsets of the data.
3. Identifying distinct types of users from the top principal components, including measures of influence, spam detectors, and content providers.
IRJET- Competitive Analysis of Attacks on Social MediaIRJET Journal
The document discusses deceptions on social media and proposes a machine learning model to detect fake accounts. It first introduces that social media brings both benefits and disadvantages, with deceptions causing problems. It then describes a machine learning pipeline with three parts: (1) a Cluster Builder that groups accounts into real and fake clusters, (2) a Profile Featurizer that extracts features from account profiles to represent clusters, and (3) an Account Scorer that trains models on cluster features and scores new clusters to detect fake accounts. The goal is to use machine learning to help address challenges of deceptions on social media.
Epidemiological Modeling of News and Rumors on TwitterParang Saraf
Abstract: Characterizing information diffusion on social platforms like Twitter enables us to understand the properties of underlying media and model communication patterns. As Twitter gains in popularity, it has also become a venue to broadcast rumors and misinformation. We use epidemiological models to characterize information cascades in twitter resulting from both news and rumors. Specifically, we use the SEIZ enhanced epidemic model that explicitly recognizes skeptics to characterize eight events across the world and spanning a range of event types. We demonstrate that our approach is accurate at capturing diffusion in these events. Our approach can be fruitfully combined with other strategies that use content modeling and graph theoretic features to detect (and possibly disrupt) rumors.
For more information, please visit: http://people.cs.vt.edu/parang/ or contact parang at firstname at cs vt edu
Poster presentation in 3rd big data conclave at vit chennai on 20th april 2017Rohit Desai
Title:Leveraging Big Data and Social Sensors For Predicting Epidemic Disease Outbreak
Event & Venue:3rd Big Data Conclave at VIT Chennai on 20th & 21st April 2017
Project Conducted Under Guide:Dr.Sweetlin Hemalatha Professor at VIT Chennai
Research of usability of Mashup Tools done for Kent County Council as part of the Pic and Mix Pilot (2009), opening up Kent related datasets for all to use and exploit.
IRJET - Election Result Prediction using Sentiment AnalysisIRJET Journal
This document proposes a method to predict election results using sentiment analysis of social media data. It involves collecting data from Twitter, Facebook, and Instagram using their APIs. The data will then be preprocessed by removing special characters and URLs. Popular machine learning algorithms like Naive Bayes and SVM will be trained on the preprocessed data to classify tweets as positive, negative, or neutral sentiment toward political parties. The classified tweets will then be analyzed to predict the outcome of elections.
This document describes a system created by researchers to analyze sentiment in tweets about 2012 US presidential candidates in real-time. The system collects tweets through the Twitter API, preprocesses them by tokenizing text, matches tweets to candidates, analyzes sentiment using a model trained on Twitter language, aggregates sentiment results by candidate, and visualizes the analysis through interactive dashboards. It provides up-to-the-minute insight into how public opinion responds to political events as expressed on Twitter.
A major North American telecom sought to identify factors driving customer churn. We applied social network analysis over several billion call records. We found that customers with a cancellation in their frequent calling network churned at twice the monthly rate.
This document provides an initial literature review and research proposal for a project analyzing social media data from Twitter to predict crime rates. The research aims to identify evidence of criminal activity by analyzing geotagged tweets and visualizing crime hotspots. Challenges include issues with jurisdiction when crimes span multiple countries and limitations of current UK law in addressing online crimes. The methodology will use both traditional legal research and interdisciplinary approaches. The literature review discusses prior research on crime prediction methods and limitations, as well as the potential for social media data to analyze crimes.
Forex-Foreteller: Currency Trend Modeling using News ArticlesParang Saraf
Abstract: Financial markets are quite sensitive to unanticipated news and events. Identifying the effect of news on the market is a challenging task. In this demo, we present Forex-foreteller (FF) which mines news articles and makes forecasts about the movement of foreign currency markets. The system uses a combination of language models, topic clustering, and sentiment analysis to identify relevant news articles. These articles along with the historical stock index and currency exchange values are used in a linear regression model to make forecasts. The system has an interactive visualizer designed specifically for touch-sensitive devices which depicts forecasts along with the chronological news events and financial data used for making the forecasts.
For more information, please visit: http://people.cs.vt.edu/parang/ or contact parang at firstname at cs vt edu
Slides: Epidemiological Modeling of News and Rumors on TwitterParang Saraf
This document presents research on modeling the spread of news and rumors on Twitter using epidemiological models. It finds that a SEIZ (Susceptible-Exposed-Infected-Skeptics) model more accurately describes the spread of information on Twitter compared to a simpler SIS (Susceptible-Infected-Susceptible) model, especially at the initial stages. The researchers analyzed several real-world events and found the SEIZ model produced lower errors between the model and actual tweet volumes. Parameters extracted from fitting the SEIZ model to data could help identify rumors versus factual news on Twitter. Limitations include not incorporating information about followers or population characteristics.
In the age of social media communication, it is easy to
modulate the minds of users and also instigate violent
actions being taken by them in some cases. There is a need
to have a system that can analyze the threat level of tweets
from influential users and rank their Twitter handles so
that dangerous tweets can be avoided going public on
Twitter before fact-checking which can hurt the sentiments
of people and can take the shape of violence. The study
aims to analyse and rank twitter users according to their
influential power and extremism of their tweets to help
prevent major protests and violent events. We scraped top
trending topics and fetched tweets using those hashtags.
We propose a custom ranking algorithm which considers
source based and content based features along with a
knowledge graph which generates the score and rank the
twitter users according to the scores. Our aim with this
study is to identify and rank extremist twitter users with
regards to their impact and influence. We use a technique
that takes into consideration both source based and
content-based features of tweets to generate the ranking of
the extremist twitter users having a high impact factor
This document proposes a method to quantify the political leaning of Twitter users based on their tweet and retweet activity. It formulates the inference of political leaning as a convex optimization problem that incorporates two ideas: (1) a user's tweets and retweets should be consistent in sentiment, and (2) similar users tend to be retweeted by similar audiences. The method is evaluated on 119 million election-related tweets from the 2012 US presidential election and achieves 94% accuracy in classifying frequently retweeted sources. A quantitative analysis of the tweets also finds that parody accounts and less vocal users are more likely to be liberal, while hashtags usage changes significantly with political events.
Prediction of Reaction towards Textual Posts in Social NetworksMohamed El-Geish
Posting on social networks could be a gratifying or a terrifying experience depending on the reaction the post and its author —by association— receive from the readers. To better understand what makes a post popular, this project inquires into the factors that determine the number of likes, comments, and shares a textual post gets on LinkedIn; and finds a predictor function that can estimate those quantitative social gestures.
Predicting the Brand Popularity from the Brand MetadataIJECEIAES
The document presents a framework for predicting brand popularity from brand metadata on social networks. It identifies thoughtful comments from brand posts using natural language processing and classifies them as favorable or unfavorable. Brand metadata like numbers of likes, shares, and identified thoughtful comments are then combined to forecast future brand popularity. The performance of the proposed framework is evaluated against recent works in terms of thoughtful comment identification accuracy, execution time, popularity prediction accuracy, and prediction time. Results show improvements over existing approaches.
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNINGijcsit
With the advent of the Internet and social media, while hundreds of people have benefitted from the vast sources of information available, there has been an enormous increase in the rise of cyber-crimes, particularly targeted towards women. According to a 2019 report in the [4] Economics Times, India has witnessed a 457% rise in cybercrime in the five year span between 2011 and 2016. Most speculate that this is due to impact of social media such as Facebook, Instagram and Twitter on our daily lives. While these definitely help in creating a sound social network, creation of user accounts in these sites usually needs just an email-id. A real life person can create multiple fake IDs and hence impostors can easily be made. Unlike the real world scenario where multiple rules and regulations are imposed to identify oneself in a unique manner (for example while issuing one’s passport or driver’s license), in the virtual world of social media, admission does not require any such checks. In this paper, we study the different accounts of Instagram, in particular and try to assess an account as fake or real using Machine Learning techniques namely Logistic Regression and Random Forest Algorithm.
IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...IRJET Journal
This document proposes using Twitter sentiment analysis and an LSTM neural network to predict election results. It involves collecting tweets related to political parties and candidates in India, cleaning the data, training an LSTM classifier on labeled tweets, and using the trained model to classify tweets as positive, negative or neutral sentiment and compare sentiment levels for each candidate. The goal is to analyze public sentiment expressed on Twitter and how it correlates with actual election outcomes.
Multiple Regression to Analyse Social Graph of Brand AwarenessTELKOMNIKA JOURNAL
Social Network Analysis (SNA) has become a common tool to conduct social and business
research. SNA can be used to measure how well a marketing campaign affect conversation in social
media. A good marketing campaign is expected to stimulate conversation between users in social media.
In this paper we use SNA metrics to understand the nature of network of top brand awareness products.
We analyses networks structure of social media conversation regarding cellular service provider and
smartphone brand in Indonesia that achieve top brand awareness in 2015. We use conversational
datasets acquired from Twitter. To get more understanding we also compare the result with network
structure of knowledge dissemination. We use multiple regression algorithm, a machine learning algorithm
that is extension of linear regression, to analyses network properties to get insight on the correlation of the
network structure and brand awareness' rank of a product. The result suggests how we should define
network properties in brand awareness context.
DYNAMIC LARGE SCALE DATA ON TWITTER USING SENTIMENT ANALYSIS AND TOPIC MODELINGAndry Alamsyah
1. The document presents a case study analyzing tweets about Uber using sentiment analysis and topic modeling to understand public opinion from large-scale social media data.
2. Sentiment analysis classified tweets as positive, negative, or neutral, while topic modeling identified dominant topics of discussion, like promotions or driver complaints.
3. The analyses found that positive tweets often discussed promotions while negative tweets addressed issues like sexual harassment allegations or unsatisfactory drivers.
Abstract: This paper introduces a system for visual analysis of news articles and emails. The system was developed in response to VAST MiniChallenge 1 and comprises different interfaces for mining textual data and network data.
For more information, please visit: http://people.cs.vt.edu/parang/ or contact parang at firstname at cs vt edu
This thesis examines an unsupervised approach to classifying users in online social networks using only simple statistics about users' behavior. The author applies sparse principal component analysis (SPCA) to Twitter data without using text or profile content. Key contributions include:
1. Demonstrating that meaningful user classification is possible using only statistics on network structure and communication patterns.
2. Developing a "semantic robustness" score to evaluate how well classifications retain meaning when reanalyzing subsets of the data.
3. Identifying distinct types of users from the top principal components, including measures of influence, spam detectors, and content providers.
IRJET- Competitive Analysis of Attacks on Social MediaIRJET Journal
The document discusses deceptions on social media and proposes a machine learning model to detect fake accounts. It first introduces that social media brings both benefits and disadvantages, with deceptions causing problems. It then describes a machine learning pipeline with three parts: (1) a Cluster Builder that groups accounts into real and fake clusters, (2) a Profile Featurizer that extracts features from account profiles to represent clusters, and (3) an Account Scorer that trains models on cluster features and scores new clusters to detect fake accounts. The goal is to use machine learning to help address challenges of deceptions on social media.
Epidemiological Modeling of News and Rumors on TwitterParang Saraf
Abstract: Characterizing information diffusion on social platforms like Twitter enables us to understand the properties of underlying media and model communication patterns. As Twitter gains in popularity, it has also become a venue to broadcast rumors and misinformation. We use epidemiological models to characterize information cascades in twitter resulting from both news and rumors. Specifically, we use the SEIZ enhanced epidemic model that explicitly recognizes skeptics to characterize eight events across the world and spanning a range of event types. We demonstrate that our approach is accurate at capturing diffusion in these events. Our approach can be fruitfully combined with other strategies that use content modeling and graph theoretic features to detect (and possibly disrupt) rumors.
For more information, please visit: http://people.cs.vt.edu/parang/ or contact parang at firstname at cs vt edu
Poster presentation in 3rd big data conclave at vit chennai on 20th april 2017Rohit Desai
Title:Leveraging Big Data and Social Sensors For Predicting Epidemic Disease Outbreak
Event & Venue:3rd Big Data Conclave at VIT Chennai on 20th & 21st April 2017
Project Conducted Under Guide:Dr.Sweetlin Hemalatha Professor at VIT Chennai
Research of usability of Mashup Tools done for Kent County Council as part of the Pic and Mix Pilot (2009), opening up Kent related datasets for all to use and exploit.
IRJET - Election Result Prediction using Sentiment AnalysisIRJET Journal
This document proposes a method to predict election results using sentiment analysis of social media data. It involves collecting data from Twitter, Facebook, and Instagram using their APIs. The data will then be preprocessed by removing special characters and URLs. Popular machine learning algorithms like Naive Bayes and SVM will be trained on the preprocessed data to classify tweets as positive, negative, or neutral sentiment toward political parties. The classified tweets will then be analyzed to predict the outcome of elections.
This document describes a system created by researchers to analyze sentiment in tweets about 2012 US presidential candidates in real-time. The system collects tweets through the Twitter API, preprocesses them by tokenizing text, matches tweets to candidates, analyzes sentiment using a model trained on Twitter language, aggregates sentiment results by candidate, and visualizes the analysis through interactive dashboards. It provides up-to-the-minute insight into how public opinion responds to political events as expressed on Twitter.
A major North American telecom sought to identify factors driving customer churn. We applied social network analysis over several billion call records. We found that customers with a cancellation in their frequent calling network churned at twice the monthly rate.
This document provides an initial literature review and research proposal for a project analyzing social media data from Twitter to predict crime rates. The research aims to identify evidence of criminal activity by analyzing geotagged tweets and visualizing crime hotspots. Challenges include issues with jurisdiction when crimes span multiple countries and limitations of current UK law in addressing online crimes. The methodology will use both traditional legal research and interdisciplinary approaches. The literature review discusses prior research on crime prediction methods and limitations, as well as the potential for social media data to analyze crimes.
Forex-Foreteller: Currency Trend Modeling using News ArticlesParang Saraf
Abstract: Financial markets are quite sensitive to unanticipated news and events. Identifying the effect of news on the market is a challenging task. In this demo, we present Forex-foreteller (FF) which mines news articles and makes forecasts about the movement of foreign currency markets. The system uses a combination of language models, topic clustering, and sentiment analysis to identify relevant news articles. These articles along with the historical stock index and currency exchange values are used in a linear regression model to make forecasts. The system has an interactive visualizer designed specifically for touch-sensitive devices which depicts forecasts along with the chronological news events and financial data used for making the forecasts.
For more information, please visit: http://people.cs.vt.edu/parang/ or contact parang at firstname at cs vt edu
Slides: Epidemiological Modeling of News and Rumors on TwitterParang Saraf
This document presents research on modeling the spread of news and rumors on Twitter using epidemiological models. It finds that a SEIZ (Susceptible-Exposed-Infected-Skeptics) model more accurately describes the spread of information on Twitter compared to a simpler SIS (Susceptible-Infected-Susceptible) model, especially at the initial stages. The researchers analyzed several real-world events and found the SEIZ model produced lower errors between the model and actual tweet volumes. Parameters extracted from fitting the SEIZ model to data could help identify rumors versus factual news on Twitter. Limitations include not incorporating information about followers or population characteristics.
In the age of social media communication, it is easy to
modulate the minds of users and also instigate violent
actions being taken by them in some cases. There is a need
to have a system that can analyze the threat level of tweets
from influential users and rank their Twitter handles so
that dangerous tweets can be avoided going public on
Twitter before fact-checking which can hurt the sentiments
of people and can take the shape of violence. The study
aims to analyse and rank twitter users according to their
influential power and extremism of their tweets to help
prevent major protests and violent events. We scraped top
trending topics and fetched tweets using those hashtags.
We propose a custom ranking algorithm which considers
source based and content based features along with a
knowledge graph which generates the score and rank the
twitter users according to the scores. Our aim with this
study is to identify and rank extremist twitter users with
regards to their impact and influence. We use a technique
that takes into consideration both source based and
content-based features of tweets to generate the ranking of
the extremist twitter users having a high impact factor
This document proposes a method to quantify the political leaning of Twitter users based on their tweet and retweet activity. It formulates the inference of political leaning as a convex optimization problem that incorporates two ideas: (1) a user's tweets and retweets should be consistent in sentiment, and (2) similar users tend to be retweeted by similar audiences. The method is evaluated on 119 million election-related tweets from the 2012 US presidential election and achieves 94% accuracy in classifying frequently retweeted sources. A quantitative analysis of the tweets also finds that parody accounts and less vocal users are more likely to be liberal, while hashtags usage changes significantly with political events.
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITYcscpconf
This document summarizes previous research on analyzing social media like Twitter to detect criminal or suspicious activity. It discusses frameworks that have used features of Twitter like hashtags, mentions, retweets etc. to analyze user profiles and communication networks. Some key approaches discussed include using graph analysis to identify influencers and automated accounts, predicting crime locations using Twitter data and linear regression, and detecting criminal networks through node analysis. However, many of the past studies did not fully address challenges like evaluating tweet and user credibility to reduce false positives.
Stock market prediction using Twitter sentiment analysisIJRTEMJOURNAL
In a study, it was investigated relationship among stock market movement and Tweeter feed
content. We are expecting to see if there is connection among sentiment information extracted from the Tweets
using a Vader in predicting movements of stock prices. As a result it was obtained strong positive correlation with
a coefficient of correlation to be 0.7815.
The document provides an overview of sentiment analysis and summarizes the current approaches used. It discusses how machine learning classifiers like Naive Bayes can be used for sentiment classification of texts, treating it as a two-class text classification problem. It also mentions the use of natural language processing techniques. The current system discussed will use machine learning and NLP for sentiment analysis of tweets, training classifiers on labeled tweet data to classify the polarity of new tweets.
A large-scale sentiment analysis using political tweetsIJECEIAES
Twitter has become a key element of political discourse in candidates’ campaigns. The political polarization on Twitter is vital to politicians as it is a popular public medium to analyze and predict public opinion concerning political events. The analysis of the sentiment of political tweet contents mainly depends on the quality of sentiment lexicons. Therefore, it is crucial to create sentiment lexicons of the highest quality. In the proposed system, the domain-specific of the political lexicon is constructed by using the supervised approach to extract extreme political opinions words, and features in tweets. Political multi-class sentiment analysis (PMSA) system on the big data platform is developed to predict the inclination of tweets to infer the results of the elections by conducting the analysis on different political datasets: including the Trump election dataset and the BBC News politics. The comparative analysis is the experimental results which are better political text classification by using the three different models (multinomial naïve Bayes (MNB), decision tree (DT), linear support vector classification (SVC)). In the comparison of three different models, linear SVC has the better performance than the other two techniques. The analytical evaluation results show that the proposed system can be performed with 98% accuracy in linear SVC.
This document summarizes the Fall 2018 issue of the Journal of Public Relations Education. It includes an introduction from the editor, a table of contents listing three research articles and teaching briefs on public relations education topics, and two software reviews of social media monitoring tools. The issue reflects work from previous editors and reviewers to select and format research and teaching content for publication.
Using Social Media to Measure the Consumer Confidence: The Twitter Case in SpainManu García
The goal of this project is to make an index which contains the consumer confidence. The source of information used to create this indicator has been Twitter, the methodology comes from opinion mining sentiment analysis and the metric used to check if this information can be usefull is the pearson's correlation coeficient.
There is evidence that the consumer confidence can be found in social media and these findings might be usefull to create alternative indexes in shorter intervals of time, or even in regional basis.
This document summarizes a research paper on analyzing sentiments from Twitter data using data mining techniques. The paper presents an approach for analyzing user sentiments using data mining classifiers and compares the performance of single classifiers versus an ensemble of classifiers for sentiment analysis. Experimental results show that the k-nearest neighbor classifier achieved very high predictive accuracy, and single classifiers outperformed the ensemble approach.
591 Final Report - Team 7 - Political IssuesTim Sawicki
This document discusses collecting and analyzing social media data related to political issues from the 2016 US presidential election. It describes scraping data from Facebook, Twitter, and Reddit to analyze what issues were most debated and contested. Location data was extracted from tweets to analyze debates by state. Issues were identified by frequency and sentiment analysis was used to measure passion around topics. The results were compiled into an interactive web application to filter issues and locations.
Text mining on Twitter information based on R platformFayan TAO
This document summarizes an analysis of tweets related to the hashtag "#prayforparis" following the November 2015 Paris attacks. The analysis was conducted using R for text mining. It preprocessed 3200 tweets by cleaning, stemming words, and building a term-document matrix. Frequent words and associations were identified. Tweets were clustered into 10 groups and sentiment analysis was performed, finding most expressed sadness, anger, and prayed for Paris with a positive attitude toward the attacks. Key text mining techniques like preprocessing, matrix building, clustering and sentiment analysis were implemented in R to analyze information from Twitter about this event.
There are various online networking sites such as Facebook, twitter where students casually discuss their educational
experiences, their opinions, emotions, and concerns about the learning process. Information from such open environment can
give valuable knowledge for opinions, emotions and help the educational organizations to get insight into students’ educational
life. Analysing down such data, on the other hand, can be challenging therefore a qualitative research and significant data
mining process needs to be done. Sentiment classification can be done using NLP (Natural Language Processing). For a social
network that provides micro blogging services such as twitter, the incoming tweets can be classified into News, Opinions,
Events, Deals and private Messages based on authors information available in the tweets. This approach is similar to
Tweetstand, which classifies the tweets into news and non-news. Even for e-commerce applications virtual customer
environments can be created using social networking sites. Since the data is ever growing, using data mining techniques can get
difficult, hence we can use data analysis tools
Big data analysis of news and social media contentFiras Husseini
This document summarizes research from the Intelligent Systems Laboratory at the University of Bristol on analyzing large amounts of news and social media content using computational methods. It discusses several studies, including analyzing over 400 million tweets to track public mood in the UK, extracting narrative networks from over 125,000 news articles about the 2012 US elections, comparing differences between news outlets and their topics/writing styles using machine learning, modeling the EU news media network using clustering and translation techniques, and predicting popular news articles based on their content. The research demonstrates how computational social science can reveal patterns in large datasets that were previously impossible to analyze at scale.
This document summarizes a study analyzing social media influence and credibility. A team of students and professors extracted different types of data from Twitter, including mentions of users, tweets by authorities, and keywords. They developed a semantic parser to analyze tweet content using ontological semantic technology. An initial linear score was formulated to measure user influence, and network analysis identified pivotal users between communities. Validation will compare content analysis to real-world events to supplement credibility assessment. The study has potential applications in public policy, business, psychology, and traffic monitoring.
IRJET - Implementation of Twitter Sentimental Analysis According to Hash TagIRJET Journal
This document proposes a model for analyzing sentiment from tweets using hashtags. It involves collecting tweets, preprocessing the data by removing URLs and stopwords, training a classifier using Naive Bayes, and classifying tweets as positive, negative, or neutral. Hashtags are also classified to help organize tweets by topic. The proposed system is intended to help large companies understand public sentiment about their brands by analyzing tweets in real-time.
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2VecIRJET Journal
This document discusses improving real-time Twitter sentiment analysis using machine learning and Word2Vec. It begins by introducing sentiment analysis and its importance for product analysis and business. Next, it describes extracting Twitter data using APIs, preprocessing it, and applying machine learning algorithms like Naive Bayes, logistic regression, and decision trees to classify tweets as expressing positive, negative or neutral sentiment. It also reviews literature on using techniques like linguistic analysis and ensemble models to improve sentiment analysis from social media sources.
IRJET - Political Orientation Prediction using Social Media ActivityIRJET Journal
This document discusses research into predicting the political orientation of Twitter users based on their social media activity. The researchers aim to incorporate factors like tweets, retweets, followers, followees, and network connections to better understand how political views are expressed and shaped on Twitter. Prior studies that have analyzed political bias in media outlets and the spread of information across partisan networks on social media are reviewed. The researchers describe collecting and analyzing Twitter data including tweets, retweets, mentions, followers and who a user follows to predict individual users' political leanings.
Running head DEPRESSION PREDICTION DRAFT1DEPRESSION PREDICTI.docxhealdkathaleen
This paper explores using machine learning and natural language processing techniques to analyze social media posts and other online behaviors to detect levels of depression in individuals. Key approaches discussed include using k-means clustering and neural networks on sources like reviews, posts, and articles. Link mining and weighted network modeling are also used to understand relationships between online content and detect patterns associated with depression. The goal is to help identify individuals who may be depressed so counselors can better assist them.
IRJET- An Experimental Evaluation of Mechanical Properties of Bamboo Fiber Re...IRJET Journal
This document discusses tweet segmentation and its application to named entity recognition. It proposes a novel framework called Hybrid Segmentation that segments tweets into meaningful phrases by maximizing the stickiness score of candidate segments based on global context from external sources like Wikipedia and local context from linguistic features within a batch of tweets. Two methods are developed - one using local linguistic features and the other using local term dependency. Experimental results on two tweet datasets show improved segmentation quality when using both global and local contexts compared to global context alone. The segmented tweets achieve higher accuracy for named entity recognition compared to word-based methods, demonstrating the benefit of tweet segmentation for downstream natural language processing applications.
IRJET- Tweet Segmentation and its Application to Named Entity RecognitionIRJET Journal
This document summarizes a research paper on tweet segmentation and its application to named entity recognition. It proposes a novel framework called Hybrid Segmentation that splits tweets into meaningful segments to preserve semantic context for downstream natural language processing applications like named entity recognition. HybridSeg finds optimal tweet segmentations by maximizing the stickiness score of segments, which considers global context based on English phrases and local context based on linguistic features or term dependencies within a batch of tweets. Experiments show significantly improved segmentation quality when learning both global and local contexts compared to global context alone. The segmented tweets can then be used for high accuracy named entity recognition through part-of-speech tagging.
Similar to Are Positive or Negative Tweets More "Retweetable" in Brazilian Politics? (20)
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Are Positive or Negative Tweets More "Retweetable" in Brazilian Politics?
1.
2. Introduction:
The present work has been developed with the purpose of participating in the challenge
promoted by Rosette, exemplifying the combined use of RapidMiner software and Rosette
extensions for RapidMiner (see https://www.rosette.com/calling-all-data-scientists/).
It is worth to point out that none of the presented data or results can be said to be conclusive
regarding the good or bad qualification of the proposed scenario. In fact, the goal of this work
is only to promote the combined use of the two technologies mentioned above. The size of the
sample was not taken into account so that the results represent the qualification of the
proposed scenario.
2
3. 1. Objectives:
- To analyze the relationship between the current moment of economy and politics in Brazil,
and the international scenario, based on tweets generated in the social network Twitter
(www.twitter.com) in English language, containing the terms "Brazil" and "Michel Temer".
- The choice of the terms "Brazil" and "Michel Temer" were based on the fact that these two
terms are intrinsically linked to Brazil's political and economic scenario. There are no political
motivations, legal or personal, in this choice.
1.1 Specific objectives:
1.1.1: Analyze the statistical correlation between the number of retweets and the perceived
feeling in each of the original tweets (positive, neutral or negative), evaluating which type of
message has the greatest proliferation power among the analyzed tweets, considering the
number of retweets as a factor of proliferation;
1.1.2: Categorize, search the main entities in specific categories, as well as analyze the
sentiment of each tweet, according to the relevant categories, in order to assess the political
and economic scenario of Brazil, focus of this study.
3
4. 2. Description:
2.1 First stage:
At this stage, using RapidMiner, we have collected tweets in English containing the term "Brazil" and
tweets containing the term "Michel Temer", current president of Brazil. For this stage, two
RapidMiner operators were used: (i) Search on Twitter, which connects Rapid Miner with the Twitter
API and get the tweets with the parameterized query, and (ii) the Write Database operator, which
writes those tweets into a local MySQL-type database. Writing the tweets into a database is
important to accumulate the tweets along several days and, thus, generate a bigger database.
Tweets for the two terms were accumulated from December 1st to December 17th 2016, following
the Twitter API rules of not providing data that is more than a week old.
4
Search
Twitter
Write
Database
RapidMiner Process:
5. 2. Description:
2.2 Second stage:
At this stage, a RapidMiner process was assembled using the read database operator to access
the database with the stored tweets, followed by the Analyze Sentiment operator of the
Rosette Text Toolkit extension, which, using the external API communication, ranks each tweet
with respect to the Feeling perceived in the text as positive, neutral or negative. After the
feeling classification, the Map operator was inserted to convert the pos, neu or neg outcomes
into numerical parameters 1, 0 or -1, respectively. This conversion is required for the next
Correlation Matrix operator, which requires numeric variables. From the correlation matrix
generated as the final result, we obtain the statistical correlation between the Sentiment
column and the Retweet-Count column. This provides the answers to one of the questions
previously defined as the objective of this study: does the way a message spreads through
retweets depend on its positive, neutral or negative content?
5
6. 2. Description:
2.2 Second stage:
For purposes of understanding, the following definition of statistical correlation is taken from
the description of the RapidMiner Correlation Matrix operator: "A correlation is a number
between -1 and +1 that measures the degree of association between two attributes (call them
X and Y). A positive value for the correlation implies a positive association. In this case large
values of X tend to be associated with large values of Y and small values of X tend to be
associated with small values of Y. A negative value for the correlation implies a negative or
inverse association. In this case large values of X tend to be associated with small values of Y
and vice versa."
6
Read
Database
Analyze
Sentiment
RapidMiner Process:
Map Correlation
Matrix
Rosette Text Toolkit
7. 2. Description:
2.2.1 On the correlation calculations
The relationship between the classification of news into positive, neutral and negative
contents and the interest of the readers in these stories, especially within the context of
politics, has been subject of long debate over the past years. In Ref. [1], for example, among
other discussions, Trussler and Soroka investigated how negative, positive and neutral news on
the Internet attract the interest of readers. Results of their study suggest that news with
negative headlines are more prone to be selected to be read further, specially when the text
has political content, although this becomes more evident when the topic of the news is on
political strategy. Our study share similarities with [1], but with a social network character: we
analyze how the sentiment (positive/negative/neutral) of contents published in Twitter affect
the number of re-tweets and sharing of these contents. With the large database captured in
the first stage, we also go further and calculate correlation between the sentiment and the
number of re-tweets.
7
8. 2. Description:
2.2.1 On the correlation calculations
Of course, larger databases, that would provide even more accurate correlation parameters,
can be obtained by adjusting parameters of the RapidMiner tool, or simply by taking a longer
data collection time, so that more tweets are collected.
[1] M. Trussler and S. Soroka, Consumer Demand for Cynical and Negative News Frames, The
International Journal of Press/Politics 19, 360 (2014).
Just like in [1], we label negative, neutral and positive entries with a sentiment parameter s =
-1, 0, and 1, respectively. This arbitrary choice of parameters does not affect the results of the
correlation calculations, either qualitatively of quantitatively, as one can check by analyzing
e.g. Pearson's correlation coefficient mathematical expression. The only requirement for the
parameter s is to have its value increasing from negative, to neutral, to positive – in this way,
we understand e.g. a negative correlation as being due to the fact that lower (higher) values of
s are more likely to be connected to higher (lower) values of the number of retweets – in other
words, converting s back to the sentiment classifications, negative correlation would mean a
connection between negative (positive) comments and more (less) re-tweets.
8
9. 2. Description:
2.2.1 On the correlation calculations
Therefore, the combination of RapidMiner and Rosette tools allows us to assess the old
question of “how the negative/neutral/positive character of a given political news affects its
impact among readers”, but now using data analysis tools, which are easy to handle and avoid
the need of running experiments that usually involve a number of participants, computer
based surveys, etc.
[1] M. Trussler and S. Soroka, Consumer Demand for Cynical and Negative News Frames, The
International Journal of Press/Politics 19, 360 (2014).
9
10. 2. Description:
2.3 Third Stage
In the third stage, a process was set up with the Read Database operator to access the database with
tweets with the term "Brazil" (for this purpose, a SELECT was used inside the operator, in order to
combine all the tables generated from the Twitter API, one for each day of collection). After this
operator was inserted, the following operators of the Rosette Text Toolkit were included: (i)
Categorize, in order to find all the tweets generated between 1st and 17th of December of 2016
classified by the operator in the category Law, Gov't & Politics; (ii) Extract Entities, which analyzes if
there is a predominance of specific entities in the tweet classified in the category Law, Gov't &
Politics; and (iii) Analyze Sentiment, which analyzes if there is a predominant sentiment
classification, by specific entities found in tweets categorized as Law, Gov't & Politics.
10
Read
Database
Extract
Entities
RapidMiner Process:
Categorize
Analyze
Sentiment
Rosette Text Toolkit
11. 3. Results: second stage
In the first graph, we compare the statistical
correlation obtained during the 17 days of collection
between the volume of retweets and the
classification of feeling in the generated text. It
should be noted that the correlation of the database
with tweets with the term "Brazil" shows a tendency
of dissemination of weak negative tweets, with result
of -0,16. The tweets generated with the term "Michel
Temer" have a correlation with the tendency of
dissemination of negative tweets also weak, with
result of -0.25, although close to the zone of
moderate correlation (above -0.30), with peaks
reaching -0.49 some days, as shown in the following
graphs.
11
-0,25
-0,16
Michel Temer Brazil
Correlation result by
search term
12. 3. Results: second stage
For the term "Michel Temer", when we analyzed the statistical correlation factors for each of the
collection days, we noticed the negative predominance of correlation indexes. In fact, there is
no case along the analyzed days where positive correlation indexes are observed, i.e., positive
tweets didn't lead to greater number of retweets in any of these days. In six of the seventeen
days analyzed, the negative correlation level even exceeds the -0.30 margin.
12
-0,49
-0,27
-0,36 -0,36
-0,08
-0,18
-0,11
0,01
-0,15
-0,24
-0,35
-0,29
-0,03
-0,26
-0,02
-0,37-0,33
Term: Michel Temer
Correlation per day
13. 3. Results: second stage
As for the term "Brazil", when we analyze the correlation factor per day, we observe a greater
fluctuation of indexes, ranging from a moderate correlation strength for the dissemination of
tweets classified as positive, to a moderate correlation force for proliferation of tweets
classified as negative. Tweets that have the term Brazil involve diverse questions, including
those about politics. As a matter of fact, the collection period was followed by an accident with
the Chapecoense soccer team, which generated a great commotion worldwide, thus somewhat
affecting the statistical results shown here.
13
-0,16
0,1
-0,12
-0,01
0,46
-0,01
0,1
-0,27
-0,42
-0,01 -0,07
-0,16 -0,17
0 0,05
-0,57
0,36
Term: Brazil
Correlation per day
14. 3. Results: second stage
In an analysis of the tweets collected over the seventeen days for each of the terms, the
percentage of tweets that were classified as negative on each of the bases was checked. Notice
the large volume of negative tweets for the database with the term "Michel Temer". This result
gives us a more complete understanding of the situation and confirms the tendency of a
greater dissemination of tweets classified as negative (volume of retweets), which have the
term "Michel Temer".
14
75%
25%
Percentage of negative tweets per search term
Michel Temer Brazil
15. 3. Results: third stage
The focus of this study is Brazil's current political scenario. The graph below shows the
percentage of all tweets collected during the first 17 days of December that have the term
"Brazil", which were categorized as Law, Gov't & Politics, according to the operator Categorize of
the Rosette Text Toolkit.
15
2%
98%
Distribution by category
Law, Gov't & Politics Others categories
16. 3. Results: third stage
This graph demonstrates the analysis of tweets categorized as Law, Gov't and Politics and their
behavior with respect to the classification of feelings. The operators of Categorize and Analyze
Sentiment were used, showing that in this category, tweets classified as negative are
predominant.
16
20%
18%
62%
Classifier: Law, Gov't & Politics tweets
Pos Neu Neg
17. 3. Results: third stage
The graph below shows the combination of two operators of the Rosette Tool Kit, allowing,
after the process of categorizing the tweets, to extract the most relevant entities of a certain
category. The combination of these operators allows a detailed understanding of the proposed
scenario and can be a powerful tool for decision making in various segments, such as press,
communication or institutional relations.
17
38%
24% 24%
Senate President Supreme Court
Key entities found in tweets categorized as Law, Gov't &
Politics
Key entities found in tweets categorized as Law, Gov't & Politics
18. 4. Conclusions
By making the combined use of RapidMiner with the Rosette Text Toolkit, we have been able to
create a powerful way to analyze data in text, especially social networks. The use of both tools allows
professionals from different areas, who have the need to analyze this type of information, to do so
with low levels of technical knowledge.
Focusing on the result and the analyzes made possible by using the Rosette Text Toolkit gives you
the chance to analyze social media data much more deeply than other tools with standardized
reports. It is possible to continuously create new metrics and think about the process of knowledge
discovery in the database. As an example, we can mention the search for patterns in tweets that are
classified as negative, that belong to a certain category, with the predominance of a specific entity.
In terms of decision-making, the combination of the two tools, with the methodologies developed
in this work, allows real-time evaluation of people's reaction to a given political scenario. Through
these analyzes one can shape institutional campaigns, measure public interests or even model
speeches and other communications in line with people's yearnings.
18
19. 5. Final remarks
This work was developed by Delano Lima, graduated in Advertising and postgraduate in Marketing
Management by UNIFOR - University of Fortaleza. Validation of the process of converting the feeling
classifications from text to numerical data was done in collaboration with Prof. Andrey Chaves, from
the Department of Physics of Universidade Federal do Ceará (UFC), PhD in Physics by UFC and
University of Antwerp, in Belgium, with a post-doc period at Columbia University, USA.
For any question about this work, please contact Mr. Delano Lima at delano@miningmetrics.net. For
specific questions about the process of converting sentiment classifiers into text to numbers, please
contact Professor Andrey Chaves at andrey@fisica.ufc.br.
19