3. Twitter
• Popular microblogging web service
• 140 character per message (tweet) limit
• Started in 2006, over 645 million users today*
• 58 million tweets per day*
• 9,000 tweets per minute*
* Source: www.statisticbrain.com/twitter-statistics
4. Sentiment Analysis
• Identifying, extracting & processing subjective
information from source material
• Subjective information includes attitudes,
emotions & opinions
• Appropriate for binary classification (positive vs.
negative, good vs. bad, etc.)
• Useful for movie reviews, political election
opinions, etc.
5. Project Aim
Interested in exploring the relationship between:
• Length of tweet (number of characters)
AND
• Sentiment score of tweet
6. Problem Description
The research project tasks:
1. Capture Twitter data
2. Build custom sentiment dictionary
3. Process tweets
4. Create dataset
5. Cluster tweet data
7. Methodology
● Custom Python scripts to capture and process live
tweets over 4 week schedule
● Use k-means clustering in Weka to look for
natural sentiment patterns
● Any correlation between length of tweet and its
sentiment (positive/negative/neutral)?
8. Results
Sentiment scores of shorter tweets appear more
tightly-centered around their cluster’s centroid
Longer tweets become less-centered on the
applicable centroid
As the number of characters in a tweet would lead
to a greater number of terms, which would
increase the chances of terms being assigned a
score, this seems intuitive