This research presentation provides an overview of ongoing research into data mining twitter to determine sentiment, insight and knowledge found in tweets. An overview will be provided of the research project a review of my research questions and hypothesis’ as well as a review of the theoretical basis for this research.
This research has been completed. You can view the outcome of this research at http://www.slideshare.net/ericbrown/these-slides-cover-the-final-defense-presentation-for-my-doctorate-degree-the-topic-analysis-of-twitter-messages-for-sentiment-and-insight-for-use-in-stock-market-decision-making
Also, check out my website and service for investing in the markets using Twitter Sentiment at http://tradethesentiment.com
The research proposed herein will be my Doctoral Dissertation Topic.
TeamStation AI System Report LATAM IT Salaries 2024
Understanding Twitter Sentiment for Investing Decisions
1. Analysis of Twitter Messages for
Sentiment and Insight for use in Stock
Market Decision Making
Eric D. Brown
Dakota State University
2. Introduction
• Agenda
• Background
• Literature Review
• Research Summary & Model
• Research Questions & Hypotheses
• Contributions
• Research Methods
• Preliminary Results
• Discussion & Challenges
• Conclusions & Future Work
3. Background
• Sentiment has long been an underlying factor in the investing
world
• Consumer Confidence Index
• Investors Intelligence Sentiment Index
• “Market Sentiment”
• Rather than waiting days, months or weeks, can the
‘sentiment of now’ be used to improve trading performance
and investing decisions?
• Can Twitter be used to determine the ‘sentiment of now’?
4. Background
The thoughts driving this research are:
• Can analysis of publicly available Twitter Messages provide
insight for decision making for investing?
• Do Twitter messages (and their subsequent sentiment) have
any effect on movement in the stock market?
• Can Twitter messages be mined and analyzed to predict
movements in the stock market?
• Does a Twitter user’s reputation have an effect on how people
perceive and use their shared investing ideas?
5. Literature Review
• Wysoki (1998) – Strong positive correlation between volume
of messages posted on message boards overnight and next
day’s trading volume and stock returns
• Tumarkin and Whitelaw (2001) – concluded that there are no
predictive capabilities found within message board activity
• Antweiler and Frank (2004) – Used sentiment analysis to
show strong positive correlation between message board
posts and next day trading volume and volatility. Showed
minor correlation between message board posts and next day
price activity.
6. Literature Review
• Gu, et al (2006) – Found that aggregation of individual
recommendations on stock message boards have no
predictive power on future stock returns
• Das and Chen (2007) – Using sentiment analysis of messages
on message boards, found no correlation between sentiment
and individual stock price movement but did find positive
correlation of the aggregate sentiment of a set of aggregate
stocks and movement in the stock market
• Zhang (2009) – Studied the reputation of a message board
poster and showed that a ‘better’ reputation were shared
more and had a higher effect on sentiment
7. Literature Review
• Bollen, Mao & Zeng (2010) – Using sentiment
analysis, determines the ‘mood’ of the twitter universe and
then predicts the next day movement of the Dow Jones
Industrial Average – with an 87.6% accuracy.
• This model is being used by a Hedge Fund to actively trade. The
first month of trading showed profitability
• Sprenger and Welpe (2010) – Focused on the S&P 100 stocks
and the sentiment of those stocks. Showed that sentiment of
the company on Twitter closely follows market movements.
Also shows positive correlation between trading volume and
message volume on Twitter for that company.
• Research laid the groundwork for TweetTrader.net
8. Literature Review
• Vincent & Armstrong (2010) – Undertook a research project
to understand how Twitter ‘buzz’ and measure the ‘change of
context’ within Twitter messages. They called this change of
context the ‘breaking point’ - when messages turn from
Bullish to Bearish (and vice versa).
• Using these ‘breaking point’, a profitable automated trading
system was developed to trade the Forex market.
• Saavedra, Hagerty, Uzzi (2011) – Studied a proprietary trading
firm to determine if ideas shared through instant messaging
platforms lead to increased performance (measured in terms
of profitability).
9. Literature Review
• Additional research in Sentiment Analysis of Twitter:
• Bifet & Frank, 2010 - Sentiment knowledge discovery in twitter
streaming data
• Pak & Paroubek, 2010 - Twitter as a Corpus for Sentiment analysis
and Opinion Mining.
• Romero, Meeder, & Klienberg, 2010 - Differences in the
Mechanics of Information Diffusion Across Topics: Idioms,
Political Hashtags, and Complex Contagion on Twitter
• Castillo, Mendoza & Poblete, 2010 - Information credibility on
twitter.
• Diakopoulos & Shamma, 2010 - Characterizing debate
performance via aggregated twitter sentiment.
10. Research Summary
• The goal of this research is a more thorough understanding of
Twitter users and their sharing of investing ideas and how
those ideas can or should be used in investing decisions
• If Twitter messages do convey some form of sentiment which
is correlated to activity in the stock market, can this sentiment
be used in a predictive manner?
• Can this large distributed network of users be ‘tapped’ to build
a decision support system for the generation of investing
ideas?
11. Research Summary
• This research will attempt to:
• Study how individuals use Twitter to share and consume
knowledge in support of their investing decisions
• Determine whether correlation exists between the sentiment of a
Tweet and movement in the stock market
• Determine whether there are times of day (or days of the week)
that provide more ‘weight’ toward sentiment
• Understand how a user’s reputation might affect the sentiment of
a company or sector
12. Research Model User Reputation
Twitter Sentiment Analysis Social
For Stocks and Sectors Analysis of
Twitter
Users
Stock & H1a, H1b, H1c
Sector H4a, H4b
Analysis Information
Content of
Sentiment Twitter
Weighting H2a, H2b Messages
within
Sectors Correlations Predictive Nature
with Stock of Shared Tweets
Market
Day / H3 Movement How users might
Time
use Twitter for
Analysis
Decision Support
in investing
13. Research Questions & Hypotheses
• RQ-1: Using a given sector of the stock market, does the
sentiment for that sector match the weighted sentiment for
the stocks within that sector? How well does the sentiment
predict price and volume movement?
• H1a: The sentiment of a sector will match the overall
averaged sentiment of all stocks within the sector.
• H1b: The sentiment of a sector can be used to predict the
movement of all stocks in that sector.
• H1c: The sentiment of a sector or stock on any given day will
provide a prediction for the next day’s movement in that stock
or sector
14. Research Questions & Hypotheses
• RQ-2: Are there specific stocks within a given sector that
supply the majority of the sentiment for that sector? If so, do
these stocks supply sentiment in correlation to the weighting
of those stocks in the sector?
• H2a: The sentiment of a stock within a sector will affect the
sentiment of the sector based on the relative weighting of that
stock within the sector.
• H2b: The stocks that provide the most weight toward the
sentiment of a sector are also the stocks with the highest number
of mentions on Twitter.
15. Research Questions & Hypotheses
• RQ-3: Are there times of the day or days of the week that
provide a more accurate and informative sentiment for a stock
or sector?
• H3: Messages sent during non-market hours (i.e., evenings and
weekends) will have the most effect on sentiment for the
following day
16. Research Questions & Hypotheses
• RQ-4: Are there specific users that provide more ‘weight’ to a
sentiment of a stock or sector based on the users’ reputation?
Do retweets by these users (or of these users’ tweets) provide
more weight for the sentiment of a stock or sector?
• H4a: The number of followers of a Twitter user determines the
effect that users’ tweets will have on sentiment for a stock or
sector.
• H4b: A message sent or retweeted by a user with a large number
of Twitter followers will provide more weight toward the
sentiment of a stock or sector.
17. Contributions
• Extends the body of knowledge for Sentiment Analysis of
Twitter for decision support in the investing domain
• Extends the body of knowledge in regards to the information
content of Twitter messages and how users might use that
knowledge for decision support
• Gaining a better understanding of how a user’s reputation
effects the sharing of their information
• Building a Text Corpus that can be used in future sentiment
analysis research for twitter messages
18. Research Method
Stock
Twitter Data Market
Collection Data
Price &
Sentiment Social
Volume
Analysis Analysis
Analysis
Positive Correlation of sentiment and Reputation of
message volume with price/volume Twitter user
Understanding of predictive capabilities of Twitter Sentiment and the affect of
user reputation investing decision support
19. Research Method
• Data Collection
• Using Twitter API to collect tweets (tweet, sender, date, time)
• Tweets referencing companies and sectors are collected and stored
in a MySQL database for future study
• Using the nomenclature made popular by StockTwits
(www.stocktwits.com). Example: The stock symbol for Apple is AAPL.
Users following the StockTwits nomenclature add a “$” to the symbol
– “$AAPL”.
• Stocktwits.com describes their purpose as a place to:
• …share ideas, market insights and trades on stocks, futures and the market
in general *.
• Using Yahoo Finance data feed to gather Stock Market data (price
and volume)
• Provides historical data
20. Research Method
• Sentiment Analysis
• Using a Naïve Bayesian text classification algorithm to determine
sentiment of collected Tweets
• Naïve Bayesian is being used for simplicity but also because many
researchers have pointed out very minor differences between it and
other sentiment analysis methods
• A subset of the data collected will be manually assigned ‘sentiment’ to
build the necessary training dataset
• Using the R Programming language combined with Python and the
WEKA Data Mining Platform, a text classification algorithms will be
implemented to determine sentiment
• For each tweet, the overall score is calculated and assigned.
• Ideally, tweets will fall into +1 (Bullish), 0 (Neutral), -1 (Bearish)
buckets.
• Currently, tweet sentiments are summed and added without being
normalized
21. Research Method
• Price and Volume Analysis
• Using regression and other analysis techniques, the movements
in price and volume will be compared with sentiment of stocks
and sectors to determine if any predictive capabilities exist
between sentiment, tweet message volume, price movement and
volume.
• The Autoregressive integrated moving average (ARIMA) and
Granger Causality Analysis techniques are being considered for
use in this project for modeling predictive behavior. Other
appropriate statistical techniques may also be considered
22. Research Method
• Social Analysis
• Using social graphs and analysis of twitter users, determine
whether a tweet sent by a user with more followers provides
more ‘weight’ to a sentiment of the stocks mentioned in the
tweet.
• Using the concept of ‘retweets’, determine how far a user’s
tweet travels via ‘retweets’, can any form of reputation or ‘trust’
of that user be determined?
23. Preliminary Results
• A short study was conducted in May 2011 (May 2 to May 11) to
determine viability of data collection and sentiment analysis
• 2 Stock Market Sectors chosen to collect data:
• Energy (XLE) – consists of 41 companies
• Consumer Staples (XLP) – consists of 41 companies
• Using the Twitter API and the collection prototype, a ten day run was
initiated
• 13,000 tweets collected for XLE, XLP and 82 companies comprising the
sectors
• Basic Naïve Bayesian approach to determining sentiment using the R
programming language
• For a quick test, Hu and Liu’s (2004) polarity dataset used to determine
/ assign sentiment
24. Preliminary Results
• The XLE ETF saw about an seven-point drop from a high of
$80.80 on May 2, 2011 to a low of $73.70 on May 11, 2011.
Courtesy of StockCharts.com
33. Preliminary Results
Top 5 Holdings of the XLP ETF and their Sentiment
Company Symbol % of ETF Price Change ($) Average Sentiment
Proctor & Gamble / PG 14.69 % 1.58 0.418
Philip Morris / PM 9.60 % -1.48 0.134
Wal Mart / WMT 8.00 % 1.40 0.171
Coca Cola / KO 7.48 % 0.41 0.671
Kraft Foods / KFT 5.04 % 1.40 0.254
34. Preliminary Results
• Social Analysis
• One of the users that sent a Tweet during this test was a twitter
user named “gtotoy”.
• He has 6,502 twitter followers
• Sent 28,869 tweets
• One Tweet by gtotoy for was retweeted by 10 separate twitter users
• According to TweetReach.com, gtotoy’s reach over the last 50 tweets
(as of Oct 26 2011 @ 12:22PM):
• 13,161 people
• 145,125 Impressions
• Will gtotoy’s tweets (and subsequent retweets) provide more
‘weight’ for a stock or sector’s sentiment?
35. Social Graph for “gtotoy”
http://twiangulate.com/search/gtotoy/inner_circle/map/my_friends-1/graph_label-names/graph_bidir_only-0/
36. Preliminary Results - Discussion
• There’s not enough data gathered over the 10 day period to
begin to answer any research questions.
• Purpose of the preliminary test was to validate the research
method and approach – data can be collected, scored and
analyzed.
• Data collection has been ongoing since May 1
37. Conclusions & Future Work
• There are some challenges to this research:
• Building the training dataset will be key
• Building a corpus of investing and trading ‘words’ for
positive, neutral and bearish opinions
• Continued Access to Twitter API
• Will Twitter “Spam” have an affect on sentiment?
• Can sarcasm be detected? If not, how does it effect sentiment?
38. Conclusions & Future Work
• Based on the initial analysis presented, there appears to be an
interesting study to be done to determine if Twitter Sentiment
has predictive capabilities
• Next Steps:
• Dissertation Approval
• Continue data collection
• Determine Modeling and Predictive Analysis Approaches
• Complete Analysis & Research
• Write up